Caver Docs, Etairos.ai

v11.25 · caver-collector v1.175

The end of the volume-based SIEM tax.

Caver is the complete security operations platform built to retire the expensive, per-GB SIEM model of the last decade. One open OCSF Parquet lakehouse holds your data. Detection, SOAR, ITSI, UBA, and AI security run on top as integrated layers. We work with everything you already use, federate with most search tools (Splunk, Elastic, Sentinel, Sumo, Datadog), support every major query language, and deploy faster than anything else on the market.

v11.25

Caver Core

v1.175

Collector

123

Content Packs

4,000+

Detection Rules

Five integrated layers

Caver Core →

Multi-language query engine on OCSF Parquet. Supports SPL, SQL, KQL, Sigma, PromQL, natural language. Federates with most search tools (Splunk, Elastic, Sentinel, more).

CAVERN Security for Enterprise →

Risk-based alerting, ATT&CK-mapped detection rules, 123 content packs, 100% SigmaHQ corpus mapped, full Detection IDE with live backtest.

SLAM SOAR →

Notables, operator-editable YAML playbooks, case management, evidence preservation, signed PDF exports. Native Telegram, Slack, Teams oncall routing.

ECHO ITSI →

Service trees, KPIs, episode correlation. Health scores per service. Real-time Grafana integration.

UBA →

Per-entity behavioral baselines with admin and criticality multipliers. 65+ unsupervised anomaly models. Threat assembly across identity, endpoint, network, cloud.

Intelligence →

Claude AI agent layer with 13 wrapped operator primitives. Natural-language config, rule authoring, playbook dispatch, compliance reporting via MCP.

Companion products

Caver-Collector →

Security data pipeline. Python + Vector (Rust) + OTel. 21 receivers, 44 transforms, 22 sinks, 70+ SaaS adapters. v1.175.

Caver Forge →

CVE → detection content auto-generator. Scrapes new CVEs from 10 feeds, drafts Sigma + SPL via Claude grounded in your OCSF schema, stages findings. ~$0.01 per CVE.

Caver Industrial →

OT / ICS plugin. Dragos + Claroty integration. BACnet, DNP3, Modbus, IEC 104, S7Comm, Ethernet/IP. Per-deployment pricing.

Caver AI Observatory →

AI / LLM security monitoring. 24 content packs, 200+ purpose-built CAVERN rules covering prompt injection, shadow AI, agent abuse, supply chain.

Caver Core

Caver is a language-agnostic security data platform. The storage layer is open OCSF Parquet on any S3-compatible object store. The query layer speaks whatever language your team prefers, SPL, SQL, KQL, PromQL, or natural language via AI agents.

Architecture

The central concept: your security data lives as OCSF Parquet in a bucket you control. Caver sits on top, exposing multiple query interfaces and running detection, SOAR, ITSI, and UBA logic against the same data lake.

DATA SOURCES

Query Languages

Caver exposes a unified security data layer that any query interface can address. There is no lock-in to a single language, teams pick the tool they already know.

Caver UI

The native operator console at /ui. Speaks every supported language transparently, type SPL, SQL, KQL, Sigma, PromQL, or natural language and caver routes to the right engine. Built-in search lab, Detection IDE, SLAM notables, ECHO service trees.

SPL

Native Splunk Processing Language. Caver registers as a standard distributed search peer, existing dashboards, saved searches, and ES correlation searches run unchanged. Zero re-training for Splunk teams.

SQL

Standard ANSI SQL via DuckDB, Trino, and AWS Athena. Point any SQL client directly at the OCSF Parquet lake. Full window functions, JOINs, and aggregations across the full event history.

KQL

Kusto Query Language (Azure / Microsoft Sentinel compatible). Security teams familiar with KQL from Azure Sentinel or ADX can run KQL queries against Caver's OCSF lake without changing their workflows.

Sigma

The community standard for vendor-agnostic detection rules. Caver ships a full Sigma-to-CAVERN transpiler and 100% of the SigmaHQ release corpus mapped. Write once, run everywhere.

PromQL / Grafana

Point Grafana at Caver's metrics endpoint for dashboards and alerts. ECHO service health, UBA anomaly scores, and CAVERN firing rates are all exposed as Prometheus metrics.

Natural Language (MCP)

Describe what you want in plain language. The Intelligence orchestrator translates it to a query, executes it, and returns a plain-language answer, via Claude, GPT, or any MCP-compatible AI agent.

-- Same data, any language
-- SPL
index=aws_cloudtrail sourcetype=aws:cloudtrail eventName=ConsoleLogin
  | stats count by userIdentity.userName | sort -count

-- SQL (DuckDB / Trino)
SELECT actor_user_name, COUNT(*) as logins
FROM caver_lake.aws_cloudtrail
WHERE class_uid = 3002 AND activity_name = 'Logon'
GROUP BY 1 ORDER BY 2 DESC

-- KQL
CaverLake
| where class_uid == 3002 and activity_name == "Logon"
| summarize logins=count() by actor_user_name
| order by logins desc

Ingest Paths

CAVERN Security for Enterprise

CAVERN is the detection engine inside Caver. It replaces Splunk Enterprise Security with risk-based alerting, ATT&CK-mapped correlation rules, and 123 out-of-box content packs covering 4,000+ detection rules.

Detection Rules

Every CAVERN rule is a YAML file with a Sigma-compatible match block, an optional query override in any supported language (SPL, SQL, KQL, PromQL, raw Sigma), risk score, ATT&CK tags, and at least one fixture for CI validation.

id: cavern.ai_usage.prompt_injection_candidate
enabled: true
title: Prompt-injection candidate in LLM request
severity: high
score: 65
attack: [T1059, T1190]
match:
  selection:
    sourcetype: ai_usage
    prompt|contains:
      - "ignore previous instructions"
      - "forget the system prompt"
  condition: selection

Sigma Integration

Caver ships full first-class Sigma support, the community standard for vendor-agnostic detection rules.

SigmaHQ corpus

100% of the SigmaHQ/sigma release corpus mapped, every published rule transpiles into CAVERN at release time. All ship enabled: false; enable per-category after tuning thresholds for your environment.

caver-sigma CLI

caver-sigma import ./rules/ -o ./cavern/ to transpile any Sigma rule file or directory. caver-sigma stats ./rules/ for transpile coverage report.

CI auto-sync

The sigma-sync.yml GitHub Actions workflow polls SigmaHQ releases weekly. When a new release drops, it transpiles, opens a PR with a coverage delta table, and waits for your review.

# Transpile a single Sigma rule
caver-sigma import ./my_rule.yml

# Import the full SigmaHQ corpus
caver-sigma import ~/sigma/rules/ -o src/caver/cavern/content/v1/sigma/

# Coverage report
caver-sigma stats ~/sigma/rules/
# → 2960 transpiled (94.5%), 172 skipped (aggregations, field-refs)

Risk-Based Alerting

CAVERN's RBA model accumulates per-entity risk scores over a rolling window, then creates a SLAM notable when an entity's score crosses the threshold.

Rule firescontributes score

→

RBA aggregatorsums per entity, 24h window

→

Multipliers appliedadmin 2× · crit-prod 1.875×

→

Threshold crossedSLAM notable created

Multiplier type	Factor
Admin account	2.0×
Critical production asset	1.875×
Service account	0× (suppressed)
Default (standard user)	1.0×

Content Packs (123)

Category	Packs	Rules
AI / LLM	ai_usage · ai_coding_assistants · shadow_ai · langflow · managed_llm · litellm · portkey · cloudflare_ai_gateway · mcp · vector_db · embeddings · rag_pipelines · agent_frameworks · browser_agents · voice_ai · image_generation · fine_tuning · on_prem_inference · huggingface · model_supply_chain · prompt_library_drift · ai_governance · ai_cost_governance · ai_data_governance	200+
ATT&CK / Endpoint	art_endpoint · credential_access · defense_evasion · discovery · initial_access · lateral_movement · persistence · command_and_control · execution · privilege_escalation · collection · exfiltration · impact	240+
Host telemetry	win_security_eventlog · sysmon · linux_auditd · powershell_scriptblock · osquery · yara · clamav	140+
Edge / Network	edge_security · network · network_infrastructure · firewall_syslog · rmm_tools	75+
OT / ICS	industrial_bacnet · industrial_dnp3 · industrial_modbus · industrial_iec104 · industrial_s7comm · industrial_ethernet_ip · scada_windows · it_ot_correlation	95+
Cloud platforms	aws_cloudtrail · aws_guardduty · azure_ad · google_workspace · k8s · m365_audit · devsec_correlation · cloud_correlation	220+
EDR / Endpoint	crowdstrike_falcon · sentinelone · microsoft_defender · wazuh · sysdig_secure · edr_common · edr_to_cloud_correlation	130+
SaaS audit (70+ adapters)	okta · slack_audit · github_audit · salesforce_audit · zoom_operations · and 65+ more	380+
Community (Sigma)	Auto-transpiled at every SigmaHQ release, 100% of corpus mapped	2,960+

Detection IDE

The built-in Detection IDE at /ui/cavern-detection-ide lets you author, lint, backtest, and diff rules without touching the filesystem. Write a rule, paste a sample event, see it fire or not fire, live. Auto-tune proposes threshold adjustments based on historical FP/TP rates; apply with one click and roll back if needed.

Supported Technologies

Caver ships out-of-box integrations for 121+ vendors and protocols across security, identity, infrastructure, and OT. Every integration includes a caver-collector receiver/adapter and at least one CAVERN content pack with detection rules tuned to the source's event shape.

121+

Integrations

11

Cloud Platforms 12

AWS

Microsoft Azure

Google Cloud

Cloudflare

DigitalOcean

Linode

Vercel

Netlify

Fastly

HashiCorp

Terraform

Kubernetes

Identity & SSO 10

Okta

Auth0

JumpCloud

Duo Security

OneLogin

Keycloak

1Password

Bitwarden

Azure AD / Entra

Google Workspace

Productivity & Collaboration 12

Microsoft 365

Slack

Microsoft Teams

Zoom

Cisco Webex

Mattermost

Discord

Intercom

Box

Dropbox

Notion

Asana

Developer & DevOps 14

GitHub

GitLab

Jira

Confluence

CircleCI

Buildkite

LaunchDarkly

Snyk

Linear

Tines

PostHog

Sysdig

MongoDB Atlas

Snowflake

EDR & Endpoint 10

CrowdStrike

SentinelOne

Microsoft Defender

Carbon Black

Trend Micro

Wazuh

ClamAV

YARA

osquery

Falco

Network & Perimeter 11

Tailscale

Cloudflare Zero Trust

Cisco Meraki

Cisco Umbrella

Fortinet

Palo Alto Networks

Ivanti

Citrix

Suricata

Zeek

Wireshark

Cloud Security 8

Wiz

Lacework

Tenable

Sysdig Secure

Microsoft Sentinel

AWS GuardDuty

AWS SecurityHub

Trivy

AI & LLM 12

OpenAI

Anthropic

Microsoft Copilot

Amazon Bedrock

Azure OpenAI

Google Vertex AI

Hugging Face

LiteLLM

Portkey

LangFlow

Ollama

LM Studio

OT / ICS 7

Dragos

Claroty

Siemens

Rockwell / Allen-Bradley

Schneider Electric

ABB

Honeywell

Sales, CRM & Support 15

Salesforce

HubSpot

Stripe

Shopify

Zendesk

ServiceNow

PagerDuty

Twilio

SendGrid

Postmark

Mailgun

Mimecast

Atlassian Statuspage

Opsgenie

Datadog

Observability & Analytics 10

Splunk

Grafana

Prometheus

Elastic

Kibana

Loki

OpenTelemetry

Vector

Apache Kafka

Confluent

Looking for one you do not see?

If you do not see a tool you rely on, suggest it and we will scope it. New integrations land as a caver-collector receiver plus a CAVERN content pack, typically inside a single release cycle.

Already have a dashboard you love? Keep it. Caver registers as a peer on most search tools (Splunk, for example), so your existing dashboards, saved searches, and correlation rules keep running unchanged against the OCSF lake. Federation modes ship for Elastic / Kibana, Microsoft Sentinel, Sumo Logic, Datadog, and growing. See deployment options →

SLAM SOAR

SLAM (Security Localization And Mapping) is Caver's SOAR layer. It handles the full lifecycle from CAVERN notable creation through analyst triage, case management, evidence collection, and compliance-ready reporting.

Notables

A notable is created when an entity's RBA score crosses the configured threshold, or when a rule fires directly at high severity. Each notable carries the contributing rules, scores, entity identity, and a timeline of events.

CAVERN rule firesor RBA threshold crossed

→

Notable createdentity · score · contributing rules

→

Playbook executesenrichment · routing

→

Oncall alertedTelegram · Slack · Teams

Playbooks

Operator-editable YAML. Shipped playbooks cover AI usage, host telemetry, cloud privilege escalation, edge appliance exploitation, and OT/ICS anomalies. Playbook steps are composable, mix enrichment, routing, alerting, and case creation freely.

name: ai_usage_critical
trigger:
  notable_severity: [critical]
  pack: ai_usage
steps:
  - enrich_identity:
      field: actor.user.name
  - alert_telegram:
      message: "🔴 {title}, {actor.user.name} ({actor.department})"
  - create_case:
      priority: P1
      sla_minutes: 15
  - tag_notable:
      tags: [ai-security, auto-cased]

Cases

Cases aggregate related notables, attach evidence (Parquet exports, network captures, screenshots, forensic images) into the Evidence Locker with full chain of custody, track SLA timers, and export signed PDF reports for compliance, legal, or post-incident review. SOC KPI dashboards surface MTTD, MTTR, and false-positive rates per content pack.

Oncall Integration

Native, used by default. Rich messages with entity context, score, contributing rules, and one-tap case creation links.

Slack

Block kit messages with action buttons for acknowledge, escalate, and snooze. Two-way integration, replies update case status.

Teams

Adaptive card messages with approval flows. Both Slack and Teams ship together, never one without the other.

Evidence Locker

The Evidence Locker is Caver's tamper-evident, audit-logged store for any artifact that needs to survive an incident's full legal lifetime. It pairs with SLAM cases, CAVERN notables, ECHO episodes, and compliance reports. Every artifact lands with a SHA-256 hash, a timestamped signature, and an immutable audit trail. The same artifact can be exported decades later with full chain-of-custody verification.

WORM

Object Lock backed

SHA-256

Per-artifact hash

Ed25519

Signed timestamps

100%

Audit-logged access

Architecture

The Locker is a thin write-once layer over standard S3-compatible object storage. Caver writes artifacts with S3 Object Lock in compliance mode (legal-grade WORM). A separate ledger tracks every read, write, hold, and export. The ledger itself is appended-only and Merkle-tree hashed for integrity.

EVIDENCE LIFECYCLE

Evidence Types

The Locker is artifact-agnostic. Anything you can serialise to bytes lands as evidence with the same chain-of-custody guarantees.

Event slices (OCSF Parquet)

A timeboxed slice of the OCSF lake covering the incident window for a given entity. SLAM cases attach these automatically when a notable is created. Time-bounded, entity-bounded, immutable.

PCAP / network captures

Triggered by SLAM playbook step or attached manually. The Locker handles compression, hash, retention. Up to 4 TB per artifact (S3 multipart).

Endpoint forensic images

Disk + memory captures from EDR (CrowdStrike, SentinelOne, Defender) routed through the SLAM forensic_acquire playbook step. Standard E01 / AFF4 / raw supported.

Screenshots

Browser console, dashboard state, analyst-annotated screen captures. Attached from the SLAM case UI. Caver records the analyst identity + capture timestamp in the audit ledger.

File samples

Suspected malware, dropped binaries, exfiltrated documents. Quarantine flag on by default so the file is not retrieved without an explicit secondary auth step.

Analyst notes

Free-text narrative attached to a case. Stored as signed Markdown with author identity. Edits create new versions, never overwrite, so the historical analyst voice is preserved.

Identity context snapshot

Frozen copy of the actor's identity graph (manager, department, group memberships) at incident time. Identity may change later, evidence preserves what was true the moment the notable fired.

Asset state snapshot

Frozen copy of the affected asset's CMDB record + tags + service tree position at incident time. Same logic as identity, asset reality drifts, evidence preserves the original.

Chain of Custody

Every artifact is admissible-ready. Caver records who touched it, when, and with what cryptographic proof. The chain survives operator turnover, vendor swaps, and platform migrations.

Field	What it proves
`artifact_sha256`	The bytes have not been altered since the moment of ingestion
`ingest_timestamp` + Ed25519 signature	The artifact existed in Caver no later than this moment, signed by a Caver-controlled key
`ingest_actor`	Which user / playbook / API caller deposited the artifact, with their IdP identity at time of write
`ingest_source_event_id`	Which CAVERN notable / SLAM case / ECHO episode triggered the deposit, fully linkable
`ledger_merkle_root`	The state of the entire Locker ledger at the moment of ingestion, so the ledger itself cannot be edited without detection
`access_log`	Every subsequent read, including who, when, from what IP, with what justification (free-text + dropdown)
`retention_policy`	The retention class that applied at ingest, including any legal holds layered on later
`verification_payload`	Bundled SHA-256 + signature + ledger proof, downloadable as a standalone JSON for independent verification

Independent verification

Caver ships caver-evidence-verify, a standalone CLI binary signed and distributed with the product. Point it at an exported evidence bundle and it re-computes the SHA-256, validates the Ed25519 signature, walks the Merkle proof, and exits 0 if the chain is intact. The verifier runs offline. A defence expert, opposing counsel, or auditor can confirm chain integrity without access to your Caver deployment.

caver-evidence-verify --bundle case-2026-04127.evidence.zip
# → verifying 41 artifacts...
# → 41/41 SHA-256 valid
# → 41/41 Ed25519 signature valid
# → ledger Merkle proof valid (root: 0x4f3a...)
# → chain of custody intact through 2026-04-18T14:23:01Z
# → exit 0

Retention & Legal Hold

Retention is policy-driven and per-evidence-class. Legal holds layer on top and freeze deletion until the hold is released by an authorised user. Every hold action is logged.

Policy class	Default retention	Common use
`p1_critical`	10 years	P1 / P2 incidents, regulated industries
`standard`	7 years	Default for all SLAM cases
`operational`	3 years	Low-severity tuning / FP investigations
`research`	1 year	Threat-hunt artifacts, intel-gathering
`legal_hold`	Indefinite	Applied on top of any class, blocks deletion until released
`regulatory_lock`	Per-policy (HIPAA 6y, PCI 1y after closure, SOX 7y)	Auto-applied when the affected asset is tagged with the compliance regime

Legal hold workflow

Counsel requests holdcase id or entity name

→

Authorised user appliestwo-person rule, dual-auth

→

All artifacts frozenretention extended indefinitely

→

Counsel releasesdual-auth, audit-logged

Export Formats

Signed PDF report

Single-document narrative bundle with embedded artifact references + thumbnails + Ed25519 PDF signature. Default deliverable for executive / board reporting + compliance audit.

Signed ZIP bundle

All raw artifacts + signed manifest + verification payload + independent verifier binary. Default deliverable for legal handover, expert review, or chain-of-custody-required compliance.

STIX 2.1 export

Threat-intel oriented export. Indicators, infrastructure, attack patterns, identity, and observed-data objects bundled into a STIX package for sharing with ISACs or partner SOCs.

VERIS / DBIR submission

Pre-formatted VERIS JSON for submission to the Verizon DBIR dataset, with PII automatically redacted. Optional anonymisation passes for public benchmark contribution.

SOC 2 / ISO 27001 evidence pack

Per-control evidence bundles mapped to SOC 2 Trust Service Criteria and ISO 27001 Annex A. Drop into your auditor's portal directly.

Cold archive to Glacier

Push closed-case evidence to S3 Glacier Deep Archive for very-long-term storage at cents-per-TB. Retrieval requires the dual-auth flow.

API & SDK

The Locker exposes a REST API and Python SDK for integration with custom workflows, SOAR connectors, and external case-management systems.

REST endpoints

# Deposit an artifact under a case
POST /api/locker/cases/<case_id>/artifacts
Content-Type: multipart/form-data
X-Caver-Justification: incident-response

# Retrieve an artifact (logged in access ledger)
GET /api/locker/artifacts/<artifact_id>?justification=...

# Apply legal hold (requires dual-auth)
POST /api/locker/holds
{"case_id": "...", "second_auth_token": "...", "reason": "..."}

# Export a case as signed PDF
POST /api/locker/cases/<case_id>/export?format=pdf

# Verify an exported bundle
POST /api/locker/verify
Content-Type: application/zip

Python SDK

from caver_sdk import Locker

vault = Locker(endpoint="https://caver.internal", token=token)

# Deposit a PCAP under a case
art = vault.deposit(
    case_id="2026-04127",
    artifact_type="pcap",
    body=pcap_bytes,
    justification="suspicious egress flow capture",
)
print(art.sha256, art.locker_uri)

# Apply legal hold with dual-auth
vault.apply_hold(
    case_id="2026-04127",
    reason="pending litigation, see legal-2026-31",
    second_auth_token=counsel_token,
)

# Export and download a signed PDF
pdf_bytes = vault.export(case_id="2026-04127", format="pdf")
open("incident-report.pdf", "wb").write(pdf_bytes)

Configuration

[evidence_locker]
bucket = "caver-evidence-locker"
object_lock_mode = "compliance"   # or "governance"
retention_default = "standard"   # see retention table
signing_key_env = "CAVER_EVIDENCE_SIGNING_KEY"   # Ed25519 private key
ledger_anchor_cadence = "1h"   # how often to write the Merkle root to a public timestamping authority

[evidence_locker.acl]
deposit = ["role:analyst", "role:playbook"]
read = ["role:analyst", "role:lead", "role:legal"]
hold = ["role:legal", "role:ciso"]   # dual-auth enforced
export = ["role:lead", "role:legal"]

[evidence_locker.retention.regulatory]
hipaa = "6y_after_closure"
pci = "1y_after_closure"
sox = "7y_after_closure"
gdpr_subject_rights = "subject_request_override"   # allow regulator-driven deletion

ECHO ITSI

ECHO is Caver's service intelligence layer. It replaces Splunk ITSI with service trees, KPI tracking, episode correlation, and real-time health scores, all built on the same OCSF Parquet lake.

∞

Service depth

12

KPI types

5

Health states

Realtime

Grafana metrics

Service Trees

Define services hierarchically with parent/child relationships. Each node has a health score derived from its KPIs and contributing notables. Health propagates upward, a degraded leaf impacts every ancestor.

EXAMPLE: PRODUCTION SERVICE TREE

KPI Types

Attach any SPL, SQL, or KQL query as a KPI on any service node. ECHO ships 12 built-in KPI types covering throughput, latency, error rate, saturation, and security signal.

KPI type	Use case	Default thresholds
`throughput_rps`	Requests per second	warn < 60% peer · crit < 30%
`latency_p50`	Median response time	warn > 2× baseline · crit > 5×
`latency_p95`	95th-percentile latency	warn > 2× · crit > 5×
`latency_p99`	99th-percentile latency	warn > 2× · crit > 5×
`error_rate`	HTTP 5xx %	warn > 1% · crit > 5%
`saturation`	CPU / memory / queue depth	warn > 70% · crit > 90%
`availability`	Uptime % over window	warn < 99.5% · crit < 99%
`notable_count`	SLAM notables per window	warn > 1 · crit > 3
`anomaly_count`	UBA anomalies per window	warn > 2 · crit > 5
`auth_failure_rate`	4xx auth fail %	warn > 5% · crit > 15%
`data_freshness`	Time since last event	warn > 5m · crit > 15m
`custom`	Any SPL/SQL/KQL expression	operator-defined

Episode Correlation

Correlated bursts of notables that share an entity, timeframe, and kill-chain stage become episodes. Episodes reduce alert volume from 40 individual notables down to one full-story incident with timeline.

Notable firescontributes to entity

→

Time window match15-min sliding default

→

Entity + kill-chain matchsame user / asset / phase

→

Episode createdfull timeline · all contributors

Grafana Integration

Every service health score, KPI value, and episode count is exposed as a Prometheus metric at /metrics. Point Grafana at the endpoint for real-time dashboards. PagerDuty / Opsgenie / Telegram alerting via Grafana Alertmanager.

# Service health gauge
caver_echo_service_health{service="api-gateway"} 31

# KPI values
caver_echo_kpi_value{service="api-gateway",kpi="latency_p99"} 4823
caver_echo_kpi_value{service="api-gateway",kpi="error_rate"} 7.2

# Episode count
caver_echo_episode_count{service="customer-platform",window="15m"} 2

UBA, User Behavior Analytics

Caver UBA builds per-entity behavioral baselines and surfaces anomalies no single rule could catch. It operates across identity, endpoint, network, cloud, and OT signals simultaneously, with 65+ unsupervised anomaly models and ATT&CK kill-chain mapping built in.

65+

Anomaly models

30d

Baseline window

4

Entity types

14

ATT&CK tactics

Per-entity Baselines

Baselines are computed per individual entity over a rolling 30-day window. Deviation is scored relative to each entity's own history, not a global threshold.

UBA SIGNAL ASSEMBLY

Anomaly Models (65+)

Caver UBA ships 65+ unsupervised models organised by signal type. Models run continuously; scores update on every event. No labelled training data required.

Identity (12 models)

First-time-app access · impossible travel · concurrent logon clusters · password spray patterns · MFA bypass · token reuse · session hijack · failed-then-success · privilege escalation · service account drift · disabled-account reactivation · stale cred surge.

Endpoint (14 models)

Rare process · parent-child chain anomaly · LOLBin spike · unsigned binary execution · privilege drop · credential dump shape · code injection signature · token impersonation · DLL sideload · scheduled task creation burst · WMI persistence · registry autorun change · driver load · LSASS access.

Network (10 models)

First-time-destination · beaconing detector · low-and-slow exfil · DGA shape · DNS tunneling · TLS fingerprint anomaly · port scan · lateral movement (SMB/RDP) · suspicious user-agent · geo deviation.

Cloud (12 models)

Unusual region access · cross-region role assumption · API call rate spike · IAM enumeration · S3 mass download · KMS key abuse · CloudTrail tamper · console-from-API user · service-link-role creation · disabled MFA admin · cross-account assumption · resource-deletion burst.

Data access (9 models)

Bulk export · sensitive-table access · off-baseline query volume · schema enumeration · cross-tenant query · column-mass-select · regex-extract on PII columns · dump-to-stage anomaly · query from new BI tool.

Peer-relative (8 models)

Departmental outlier · role-based outlier · manager-cluster deviation · time-of-day vs peer norm · resource set vs peer norm · access path vs peer norm · privilege footprint vs peer · session pattern vs peer.

Peer Group Analysis

UBA constructs peer groups from the identity graph: managers + reports, department members, role-based clusters, security-group membership. An action is "anomalous" not because it crosses a fixed threshold but because it is rare for this entity AND rare among its peers.

Peer source	Cluster basis	Used by models
Identity graph (Azure AD / Okta / Google)	Manager · department · title	Department-outlier · role-outlier · manager-cluster
Security groups	Group membership intersect	Privilege-footprint · resource-set
Access patterns	K-means on application access histogram (30d)	Access-path · session-pattern
Asset graph	Service tag · tier · environment	Asset-tier outlier

Threat Assembly

Individual anomalies are assembled into threat timelines when they share an entity, overlap in time, and span multiple ATT&CK phases. A timeline is one SLAM notable that contains the full attack story rather than 40 individual alerts.

Anomalies fireacross signal types

→

Entity + time matchsame user/asset, < 4h window

→

Kill-chain span≥ 2 ATT&CK tactics

→

Timeline createdSLAM P1 notable

ATT&CK Kill-Chain Mapping

Every anomaly model is tagged with the ATT&CK tactics it surfaces. Threat assembly preferentially groups anomalies that span multiple kill-chain phases, a high-confidence signal of an active intrusion.

ATT&CK tactic	UBA models that fire here
Initial Access (TA0001)	First-time-app · impossible travel · MFA bypass · DGA shape
Execution (TA0002)	Rare process · LOLBin spike · unsigned binary · code injection
Persistence (TA0003)	Registry autorun · scheduled task · disabled-account reactivation · service account drift
Privilege Escalation (TA0004)	Token impersonation · privilege escalation · LSASS access · driver load
Defense Evasion (TA0005)	Token reuse · DLL sideload · unsigned binary · CloudTrail tamper
Credential Access (TA0006)	Password spray · credential dump · LSASS access · KMS key abuse
Discovery (TA0007)	Port scan · IAM enumeration · schema enumeration · column-mass-select
Lateral Movement (TA0008)	SMB/RDP lateral · token impersonation · cross-account assumption
Collection (TA0009)	Bulk export · sensitive-table access · S3 mass download
Exfiltration (TA0010)	Low-and-slow exfil · DNS tunneling · dump-to-stage anomaly
Command & Control (TA0011)	Beaconing detector · DGA shape · DNS tunneling · TLS fingerprint anomaly
Impact (TA0040)	Resource-deletion burst · KMS key abuse · disabled MFA admin

Cross-source Correlation

UBA correlates anomalies across signal sources through the entity graph. An identity anomaly on alice@corp + an endpoint anomaly on alice-laptop (which Azure AD links to alice) + a cloud anomaly under role alice-iam become a single threat timeline scoring 3× their individual sum.

Identity + Asset Graph Enrichment

Every anomaly is enriched at fire time with the entity's full context:

Identity graph

Manager · department · title · group memberships · on/off-boarding date · MFA status · admin flag · service account flag · sensitivity classification.

Asset graph

Owner · tier (prod/staging/dev) · service tags · linked services · criticality classification · data classification (PII/PHI/PCI) · last patched · CVE exposure.

Intelligence & AI

Caver exposes its full capabilities to AI agents via the Model Context Protocol, and ships a built-in orchestrator for natural-language operator workflows.

MCP Server

Install caver[mcp-server] and connect any MCP-compatible AI client, Claude Desktop, Claude Code, GPT-4 with function calling, or a custom agent. The MCP server enforces Caver's auth model across all queries.

// Claude Desktop / Claude Code config
{
  "mcpServers": {
    "caver": {
      "command": "caver",
      "args": ["mcp", "--config", "/etc/caver/caver.toml"]
    }
  }
}

Available MCP tools: query (run SPL/SQL/KQL), search_rules (find CAVERN rules by name/ATT&CK), get_notables, get_case, trigger_playbook, get_entity_timeline, get_service_health.

Orchestrator, 13 primitives

The Intelligence orchestrator is a Claude-powered admin interface. Operators describe what they need; the orchestrator plans and dispatches across these primitives:

Primitive	Description
`onboard_sample`	Ingest a sample event → propose OCSF mapping → generate source config + rule + fixture
`nl_to_config`	Convert plain-language description to a caver.toml stanza or CAVERN rule YAML
`forge_rule`	Author a new CAVERN rule from a threat description or CVE
`replay_against_history`	Run a rule against historical Parquet data, count fires, estimate FP rate
`lint_config`	Validate caver.toml, CAVERN rule, or pipeline config against the schema
`enrich_text`	Look up an IP, domain, hash, or CVE in threat-intel feeds
`explain_rule_silence`	Why hasn't this rule fired in N days? (data gap, filter, threshold)
`tune_threshold`	Propose + apply a score/threshold change based on FP/TP history
`start_canary`	Route X% of traffic through a new rule/pipeline config for A/B validation
`pipeline_diff`	Compare two pipeline configs in plain language
`compliance_report`	Generate coverage map for SOC 2, PCI, HIPAA, controls mapped to CAVERN rules

Threat Intelligence

Caver ships first-class threat intelligence as a layer in the platform. Every supported feed normalises to OCSF 2003 (Threat Intelligence) or OCSF 2002 (Vulnerability Finding), populates the IOC bloom-filter for detection-time matching, and is queryable via the orchestrator's enrich_text primitive. Free open feeds ship enabled by default. Commercial feeds plug in with bring-your-own credentials.

15+

Default open feeds

8+

Commercial feeds

TAXII 2.1

Custom feed standard

Daily

Bloom-filter rebuild

Default Feeds (free, open, ship enabled)

These feeds ship enabled in a default Caver install. No commercial licence required, attribution preserved per source license.

Feed	What you get	Indicator types	Cadence
abuse.ch URLhaus	Malicious URLs, live + recent	URL · domain · IP	5m polling
abuse.ch MalwareBazaar	Malware samples, family + tags	SHA256 · YARA · family	real-time
abuse.ch ThreatFox	IOCs tied to malware family	IP · domain · URL · hash	5m polling
abuse.ch Feodo Tracker	Botnet C2 IPs, Emotet / Dridex / TrickBot / QakBot / IcedID / BumbleBee	IP · port · family	hourly
abuse.ch SSL Blacklist	Malicious SSL/TLS certificates	SHA1 · family	daily
abuse.ch YARAify	YARA rules + sample matches	YARA · hash	30m polling + webhook
LevelBlue OTX (formerly AlienVault)	Curated pulses with attribution + ATT&CK tagging	IP · domain · URL · hash · email · CVE	per-pulse subscription
OpenPhish Community	Verified live phishing URLs, last 5h	URL · target brand	5m polling
CISA Known Exploited Vulnerabilities (KEV)	CVEs with confirmed in-the-wild exploitation + remediation due dates	CVE · vendor · product	daily
CISA AIS (Automated Indicator Sharing)	STIX/TAXII 2.1 indicator feed, US government and partner indicators	STIX bundle (full)	real-time TAXII
CISA CSAF Advisories	Structured security advisories from CISA + ICS-CERT	CVE · vendor · product	real-time
CISA / US-CERT Alerts	Human-readable APT bulletins + threat alerts	narrative + indicators	RSS / Atom
MISP (community sharing)	Plug into any MISP instance you have access to	full MISP attribute set	per-event subscription
Emerging Threats Open	Proofpoint community Suricata rules + IOCs	signature · IP · domain	daily
Spamhaus DROP / EDROP / ASN-DROP	Spam / hijacked / criminal hosting netblocks	CIDR · ASN	daily
Shodan InternetDB (free tier)	CVE-to-IP mapping, exposed services	IP · CVE · port · banner	on-demand lookup
GreyNoise Community	Benign internet-scan suppression (so an SSH scan does not light up every CAVERN rule)	IP · classification · noise score	on-demand lookup

Commercial / Paid Feeds (bring your own credentials)

Commercial feeds plug in via standard receivers using credentials you already pay for. Configure your API key in caver.toml threat_intel section (Doppler-backed) and the feed lights up.

Feed	Coverage	Auth
Recorded Future	Full Intelligence Card API, risk lists, alerts	API token
Mandiant Advantage / Google Threat Intel	APT profiles, IOCs, vulnerability intel	OAuth2 service principal
CrowdStrike Falcon Intelligence	Adversary tracking, IOCs, MalQuery hash lookups	OAuth2 client credentials
VirusTotal Enterprise	File / URL / domain / IP reputation lookups, intelligence searches	API key
Anomali ThreatStream	Aggregated threat intel platform feeds	API key
OpenPhish Premium	Full historic feed + target-brand attribution + WebSocket live stream	API key
abuse.ch Commercial	Commercial use of all abuse.ch feeds with no rate limits	Auth-Key header
GreyNoise Enterprise	Full-context noise dataset, IP scan history, RIOT trust list	API key
DomainTools Iris	Domain whois history, hosting graph, threat scoring	API key + username
Mimecast Threat Intelligence	Email-borne threat intel (URLs, attachments, sender reputation)	OAuth2

Custom Feeds

Any feed you can produce as STIX 2.1, OCSF JSON, CSV, NDJSON, or a webhook can be wired in without writing code. For weirder shapes, drop in a Python normaliser under src/caver_collector/transforms/ and the rest of the pipeline picks it up.

STIX 2.1 / TAXII 2.1

Industry-standard threat intel format. Point Caver at any TAXII 2.1 collection (your ISAC, ISAO, vendor TAXII root, internal MISP, etc.) and indicators land as OCSF 2003 events with full attribution preserved.

JSON / NDJSON / CSV poll

Use the api_poll receiver against any HTTPS endpoint that returns JSON, NDJSON, or CSV. Map fields to OCSF via a small YAML config, no Python required.

Webhook push

If your feed pushes (your vendor's WebSocket, your SOC's MQ bus, a SOAR playbook output), use the webhook receiver. Caver listens, normalises, and indexes.

Bulk file upload

Drop a STIX bundle or CSV into your S3 bucket. The s3_sqs receiver picks it up via SQS notification, normalises, and loads it into the bloom-filter.

MISP self-hosted

Point Caver at your own MISP instance. Caver subscribes to events, pulls attributes, and normalises to OCSF. Per-event attribution + sharing-group preserved.

Custom Python normaliser

For one-of-a-kind feeds, write a Python transform under src/caver_collector/transforms/ implementing the standard interface. Output OCSF, get the bloom-filter + enrichment + CAVERN integration for free.

Custom feed config example

[[threat_intel.custom]]
name = "acme-internal-misp"
receiver = "misp"
endpoint = "https://misp.acme.corp"
auth_key_env = "ACME_MISP_KEY"
sharing_groups = ["internal-ops", "soc-tier-3"]
poll_interval = "5m"

[[threat_intel.custom]]
name = "isac-stix-feed"
receiver = "taxii"
endpoint = "https://taxii.example-isac.org/api/v21/"
collection_id = "94b1a4eb-..."
auth_basic_env = "ISAC_AUTH_BASIC"
poll_interval = "15m"

[[threat_intel.custom]]
name = "soar-playbook-output"
receiver = "webhook"
listen_path = "/ti/playbook"
secret_env = "SOAR_WEBHOOK_SECRET"

How feeds integrate

Every feed, whether default open, commercial, or custom, follows the same path through the platform.

Receiver polls feedper-feed interval or push

→

Normalise to OCSFclass 2003 or 2002

→

Land in caver lakequeryable forever

→

Bloom-filter rebuildnightly, all enabled feeds

From there, four things automatically light up:

CAVERN detection-time matching

CAVERN rules query the bloom-filter at detection time. A lake event matching any enabled IOC fires a high-confidence notable. No per-rule maintenance, the rules are written generically (does this IP appear in any IOC source?) and the operator picks which feeds populate the filter.

enrich_text orchestrator primitive

From the Intelligence chat console: "look up 185.220.101.45" returns per-feed reputation. Same primitive is exposed via MCP for any AI client.

SLAM playbook step

SLAM playbooks dispatch enrich_indicator on notable creation to attach feed context: which feed first saw this IP, when, with what attribution.

ECHO service-health impact

CISA KEV findings against assets in your ECHO service tree degrade the service health score until the CVE is remediated. Overdue KEV CVEs become P1 notables.

caver.toml threat-intel section

[threat_intel]
bloom_filter_rebuild = "0 3 * * *"   # nightly 3am
enabled_default_feeds = ["abusech", "otx", "openphish", "cisa_kev", "cisa_ais"]

[threat_intel.commercial]
recorded_future.api_key_env = "RECORDED_FUTURE_KEY"
crowdstrike.client_id_env = "CS_CLIENT_ID"
crowdstrike.client_secret_env = "CS_CLIENT_SECRET"
virustotal.api_key_env = "VT_API_KEY"

[threat_intel.attribution]
preserve_source = true   # every IOC carries original feed + first-seen + tags
preserve_sharing_group = true   # MISP sharing groups + STIX TLP retained

AI Observatory

Caver provides the most comprehensive security coverage for AI/LLM usage in the industry. 24 dedicated content packs and 200+ purpose-built CAVERN rules cover the full AI threat surface, prompt injection, shadow AI, agent framework abuse, vector-DB exfiltration, supply-chain compromise, voice/image generation misuse, and on-prem inference monitoring.

24

AI content packs

200+

AI detection rules

OCSF 6005

Application Activity

No DLP

agent required

Core CAVERN Rules (ai_usage pack)

api_key_leak_in_promptT1552 · credential access

system_prompt_exfil_suspectT1213 · collection

model_supply_chain_matchT1195 · supply chain

prompt_injection_candidateT1059 · initial access

output_pii_leakT1567 · exfiltration

rag_indirect_injection_indicatorT1059 · indirect exec

dataset_poisoning_indicatorT1567 · exfiltration

cost_anomaly_per_userT1496 · resource hijack

off_hours_token_spikeT1078 · valid accounts

model_variance_burstT1190 · exploit public app

AI Content Packs (24)

ai_usage

42 rules covering prompt injection, PII exfiltration, credential leak, supply chain, cost anomalies, off-hours abuse, jailbreak attempts, model variance, output shape detection. The AI Observatory foundation.

ai_coding_assistants

GitHub Copilot CamoLeak, Cursor IDEsaster, Codeium / Tabnine / Continue.dev, AI-assisted code exfiltration, prompt injection through suggestions.

shadow_ai

Unsanctioned LLM endpoint traffic, employees using personal AI tools that bypass corporate controls and logging.

langflow

LangFlow exploitation (KEV-sourced CVEs). Agent orchestration framework attacks, SSRF via flow execution, unsanctioned tool invocation.

managed_llm

Azure OpenAI, Amazon Bedrock, Google Vertex AI, content filter flooding, key abuse, region drift, anomalous quota consumption.

litellm

LiteLLM proxy gateway security, unauthorised model access, key proxy abuse, audit-log gap detection, virtual-key drift.

portkey

Portkey AI gateway audit, traffic anomaly, fallback-route abuse, semantic-cache poisoning, virtual-key escalation.

cloudflare_ai_gateway

Cloudflare AI Gateway audit log monitoring, access patterns, rate-limit evasion, anomalous routing decisions.

mcp

Model Context Protocol server security, SSRF via tool calls, prompt injection through MCP responses, privilege escalation through tool chains.

vector_db

Pinecone, Weaviate, Qdrant, Chroma, pgvector, mass embedding download, schema enumeration, RAG corpus exfiltration patterns.

embeddings

Embedding-generation abuse, token-cost amplification via large input batches, model-inversion attempts, anomalous embedding spaces.

rag_pipelines

Retrieval-augmented generation security, indirect prompt injection through document corpus, source-citation poisoning, retriever bypass.

agent_frameworks

LangChain, LangGraph, AutoGen, CrewAI, agent loop exhaustion, runaway tool calls, unsanctioned action chains, memory poisoning.

browser_agents

Computer-use / browser-control agents (Claude Computer Use, Browser-Use, Skyvern), unauthorised navigation, form-fill abuse, credential theft.

voice_ai

ElevenLabs, OpenAI Realtime, Bland AI, voice clone misuse, anomalous call volume, off-baseline TTS spend, deepfake-injection detection.

image_generation

DALL-E, Midjourney, Stable Diffusion, Flux, content policy bypass, NSFW filter evasion, brand-impersonation generation patterns.

fine_tuning

Fine-tuning job security, sensitive-data inclusion in training set, model-extraction via tuning, unauthorised adapter download.

on_prem_inference

Ollama, LM Studio, llama.cpp, vLLM, TGI, local inference server exposure, unauthorised model swap, GPU resource abuse.

huggingface

Hugging Face Hub, malicious model download (pickle deserialisation), supply-chain compromise via dependency drift, dataset poisoning.

model_supply_chain

Model hash verification, known-malicious model registry, weight tampering detection, signed-checkpoint enforcement.

prompt_library_drift

System prompt corpus monitoring, unauthorised edits to production prompts, prompt-template injection, system-prompt-version drift.

ai_governance

Sanctioned-tool enforcement, per-user allowed-model lists, regulatory geo-restriction, model-card compliance, audit-log completeness.

ai_cost_governance

Per-tenant / per-user / per-model spend ceilings, batch-API abuse, fine-tune cost runaway, embedding-cost amplification.

ai_data_governance

PII / PHI / PCI shape detection in prompts, sensitive-document upload tracking, regional data-flow enforcement (GDPR / CCPA / SOC 2).

Supported AI Tools

Telemetry is normalized to OCSF Application Activity (class 6005) by the caver-collector ai_usage_normalize source. All rules operate on OCSF fields, no raw HTTP inspection, no perimeter TLS break.

Caver-Collector v1.175

The security data pipeline layer. Three equally-featured backends, Python for orchestration and complex normalisation, Vector (Rust) for high-throughput hot paths, and an OTel collector distro for organisations already running OTel infrastructure.

21

Receivers

44

Transformation & Normalization

22

Sinks

70+

SaaS Adapters

Python Pipeline

The Python orchestrator handles control-plane logic, vendor-specific API polling, and complex normalisation that benefits from Python's ecosystem.

Receivers (21)

Receiver	Purpose	Protocol / source
`hec`	Splunk HTTP Event Collector ingress	HTTPS · JSON / raw events
`hec_replay`	Replay recorded HEC NDJSON payloads	file / S3 NDJSON archive
`syslog`	Standard syslog ingress	RFC3164 + RFC5424 · TCP/UDP/TLS
`tcp_line`	Reliable newline-delimited TCP	TCP framed by newline
`udp_packet`	Generic UDP datagram (no syslog framing)	UDP raw
`file_tail`	File log tailing with checkpointing	Local files · rotation aware
`webhook`	Generic HTTP POST endpoint for SaaS push	HTTPS · arbitrary JSON
`otlp_http`	OpenTelemetry Protocol over HTTP	OTLP/HTTP · logs/metrics/traces
`kafka_consumer`	Kafka topic consumer with offset tracking	Kafka 2.x+ · SASL/SCRAM · mTLS
`kinesis_firehose`	AWS Kinesis Firehose HTTP push	HTTPS · Firehose record format
`s3_sqs`	S3 object notifications via SQS queue	S3 → SQS → object fetch
`caver_lake_replay`	Replay caver lake Parquet partitions	OCSF Parquet · time-bounded
`windows_event_log`	Windows EvtSubscribe live channel subscription	WMI EvtSubscribe
`uf_compat`	Splunk Universal Forwarder S2S inbound	Splunk S2S binary protocol
`api_poll`	SaaS API poller (70+ adapters)	HTTPS · per-vendor pagination
`modbus_tcp`	Modbus TCP industrial protocol	Modbus TCP · function code 1-127
`dnp3`	DNP3 SCADA protocol monitor	DNP3 TCP/UDP
`bacnet`	BACnet/IP building automation	BACnet/IP UDP
`iec104`	IEC 60870-5-104 power grid SCADA	IEC 104 TCP
`ethernet_ip`	EtherNet/IP (Allen-Bradley CIP)	EtherNet/IP TCP/UDP
`s7comm`	Siemens S7-300/400/1200/1500 PLC	S7Comm TCP

SaaS API Adapters (70+)

Transformation & Normalization (44)

Vector Backend (vector-caver)

Rust-speed hot path for high-throughput log sources. Fork of Vector (MPL-2.0) with four custom Caver crates targeting 300k+ events/s.

vrl-caver-stdlib

VRL functions: OCSF normalisation, ATT&CK tagging, threat-intel lookups, CVE enrichment, HMAC tokenisation, security parsing. Used with Vector's native remap transform.

caver-ottl-funcs

OTTL functions for the OTel backend: HMAC tokenisation, enrichment table lookups, OCSF field helpers. Injected into the transform processor factory.

caver-sink-search-peer

Pushes batched OCSF events to Caver's HEC endpoint at wire speed. Used when Vector feeds the Caver indexer directly.

caver-sink-parquet

Writes OCSF events to the caver lake partition scheme. Arrow/Parquet v55, Snappy compression, dictionary encoding on 6 OCSF enum columns.

OTel Backend

An OpenTelemetry Collector distribution (caver-otelcol) built with ocb v0.129. Deploy it anywhere you already run OTel infrastructure, it speaks native OTLP and drops into existing pipelines.

receivers:
  filelog:
    include: ["/var/log/app/*.jsonl"]
processors:
  transform/ocsf:
    log_statements:
      - context: log
        statements:
          - set(attributes["class_uid"], 4001)
          - set(attributes["metadata.product.name"], "caver-otelcol")
  batch:
    send_batch_size: 500
    timeout: 30s
exporters:
  awss3:
    s3uploader:
      endpoint: "http://minio:9000"
      s3_bucket: "caver-lake"
    marshaler: parquet

Caver Forge

Caver Forge is the companion product that turns newly published CVEs into queryable detection content with no human in the loop. It scrapes new CVEs from configurable threat-intel feeds, drafts Sigma rules via Claude grounded in your OCSF schema, transpiles to SPL, runs the query through Caver's /services/search/run endpoint, and writes Detection Finding (OCSF 2004) rows to stage_alerts/ in the lake. The output answers one question continuously: "would my lake have caught this CVE if I had been watching?"

10

Threat-intel feeds

~$0.01

Per CVE in LLM tokens

Staged

Not production alerts

OCSF 2004

Detection Finding output

What it does (and what it doesn't)

Yes, use Forge for

A continuous "did the lake even have data that could have caught this CVE?" signal across every CVE published in the last N hours. Runs autonomously, no analyst time required. Output lands as OCSF Detection Finding rows under stage_alerts/, queryable like any other lake data.

No, do not use Forge for

Real-time alerting. Forge is staged (background loop), Detection Finding rows land minutes to hours after a CVE is published. Use CAVERN for hand-authored detections that need to fire immediately. Forge complements CAVERN, it does not replace it.

Threat Intel Feed Registry

Forge ships a curated default feed registry. Two feeds are enabled by default (NVD + CISA KEV); the rest are listed and opt-in. User overrides live in ~/.config/cve-forge/feeds.yaml.

ID	Type	Default	What it covers
`nvd`	NVD	enabled	NIST canonical CVE feed (CPE-matched, CVSS-scored)
`cisa-kev`	CISA KEV	enabled	CISA Known Exploited Vulnerabilities. Federal patch deadline
`osv`	OSV	opt-in	OSV.dev open-source ecosystem vulnerabilities
`ghsa`	GHSA	opt-in	GitHub Security Advisories
`zdi`	RSS	opt-in	Zero Day Initiative published advisories
`msrc`	MSRC	opt-in	Microsoft Patch Tuesday CVRF bulletins
`cisco-psirt`	Cisco openVuln	opt-in	Cisco PSIRT advisories
`project-zero`	Atom	opt-in	Google Project Zero public bug tracker
`github-advisory-db`	Git repo	opt-in	github/advisory-database canonical mirror
`vulncheck-nvd`	VulnCheck	opt-in	VulnCheck NVD++ commercial mirror (faster + enriched)

Custom feeds drop in via the same YAML, point at any RSS / JSON / Atom endpoint:

feeds:
  - id: my-private-feed
    name: "Internal threat intel"
    type: rss
    url: https://intel.internal.example.com/feed
    enabled: true
    auth_env: INTERNAL_FEED_TOKEN

Pipeline

Each CVE flows through five stages, fully observable, retry-safe, idempotent on CVE ID. Stages emit structured telemetry to Caver's /metrics for ECHO health tracking.

Scrapeenabled feeds, last N hours

→

GenerateClaude prompt grounded in OCSF schema

→

TranspilepySigma to SPL

→

RunPOST to caver /services/search/run

→

SinkOCSF 2004 to stage_alerts/

Enabled feeds (NVD, CISA KEV, custom, ...)
    │
    ▼
cve_forge.scrape   normalised CVE record (id, description,
    │               CVSS, CPEs, CWEs, references)
    ▼
cve_forge.generate Claude prompt grounded in caver/sources.json,
    │               returns a validated Sigma YAML
    ▼
cve_forge.transpile pysigma + pysigma-backend-splunk to SPL
    │               (sentinel-aware: _not_applicable: true on
    │                rules that cannot match this lake's sources)
    ▼
cve_forge.run      POST SPL to caver, read X-Caver-Row-Count
    │               header, build StagedResult
    ▼
S3 OCSF parquet sink   s3://lake/stage_alerts/year=.../...
                       (class_uid 2004, Detection Finding)

Sentinel handling

Some CVEs do not map to any data shape your lake collects (firmware vulns in IoT devices you do not monitor, vulns in apps your org does not run). Forge does not pretend to generate a rule for those. The generate stage returns _not_applicable: true with a structured reason, the row lands in stage_alerts/ with status not_applicable, and the loop moves on. The kiosk surfaces these so an operator can review the coverage gap and decide whether to add a source.

Configuration

Forge runs as its own process with its own config. Caver reads Forge's output as one more lake source. Minimal install:

pip install caver-forge   # separate distribution

# point at your caver instance + LLM
export CAVER_URL=https://caver.internal:8089
export CAVER_TOKEN=$(doppler secrets get CAVER_HEC_TOKEN --plain)
export ANTHROPIC_API_KEY=$(doppler secrets get ANTHROPIC_API_KEY --plain)

# write target (same MinIO bucket caver reads from is fine)
export FORGE_S3_BUCKET=caver-lake
export FORGE_S3_PREFIX=stage_alerts/

# kick off a one-shot scrape
cve-forge scrape --hours 24

# or run the autonomous background loop
cve-forge loop --interval 1h

Caver-side source config

Add stage_alerts to your Caver sources.json so the kiosk + SPL surfaces light up the same way they would for any other source:

{
  "stage_alerts": {
    "bucket": "caver-lake",
    "prefix": "stage_alerts/",
    "schema": "ocsf_detection_finding_v1.3.0",
    "default_index": "stage_alerts"
  }
}

CLI Reference

Command	Description
`cve-forge feeds list`	Show the registry + enabled state
`cve-forge feeds enable <id>`	Enable a feed (writes to `~/.config/cve-forge/feeds.yaml`)
`cve-forge feeds disable <id>`	Disable a feed
`cve-forge scrape --hours N`	One-shot, pull from every enabled feed for the last N hours
`cve-forge scrape --feeds-config /path/to/feeds.yaml`	Use a custom feed config file
`cve-forge loop --interval 1h`	Run autonomously, scrape every interval
`cve-forge replay <cve_id>`	Re-process a specific CVE (debugging / re-grounding)
`cve-forge stats`	Coverage stats: how many CVEs in the last 7d, how many staged, how many `not_applicable`, cost
`cve-forge cost --since <date>`	LLM token cost report

Forge vs CAVERN

Forge and CAVERN sit at opposite ends of the detection lifecycle. Forge is breadth-first (every CVE, no human), CAVERN is depth-first (every rule hand-tuned). They feed each other.

Aspect	Caver Forge	CAVERN
Authoring	Autonomous (Claude-generated)	Hand-authored by detection engineers
Trigger	New CVE published	Threat model / hypothesis / incident
Speed	Minutes to hours after CVE drop	Days to weeks (proper engineering cycle)
Output	`stage_alerts/` Detection Finding rows	Production CAVERN rules, RBA-scored notables
Alerting	None (staged, batch)	Real-time, oncall paging via SLAM
Tuning	Per-CVE auto-regenerate	FP/TP feedback loop with caver-cavern auto-tune
Cost model	~$0.01 per CVE in LLM tokens	Detection engineer time
Promotion	Stage to production via SLAM Phase-5 bridge	Already in production

Promotion bridge

When a Forge-generated staged finding surfaces an obviously valuable detection (high CVSS, observed in your lake, plausible threat model), SLAM's Phase-5 bridge offers a one-click promotion: the staged Sigma rule moves into a CAVERN content pack with operator review, gets fixtures, and joins the production detection set. The lineage from CVE to staged rule to production CAVERN rule is preserved end-to-end.

OT / ICS, Caver Industrial

Caver Industrial extends the core platform to operational technology and industrial control system environments. It ships as a separate plugin with per-deployment pricing.

BACnet

Building automation protocol monitoring. Anomalous read/write to building systems, HVAC, access control.

DNP3

Distributed Network Protocol for SCADA and electric grid. Unauthorised control messages, abnormal polling.

Modbus TCP

Industrial protocol for PLCs and sensors. Coil/register manipulation, abnormal function codes.

IEC 104

SCADA protocol widely used in European power infrastructure. Control commands from unexpected sources.

S7Comm

Siemens PLC protocol (S7-300/400/1200/1500). Firmware read attempts, unauthorised CPU state changes.

Ethernet/IP

Rockwell / Allen-Bradley CIP protocol. Tag read/write anomalies, unexpected I/O scanner connections.

IT/OT Correlation

The it_ot_correlation and scada_windows packs bridge IT and OT telemetry, correlating Windows event logs from engineering workstations with industrial protocol anomalies on the same OT network segment. An attacker moving from the IT perimeter to a SCADA workstation to a PLC shows up as a single correlated timeline in SLAM.

Vendor integrations

Dragos ↗

Native Dragos Platform receiver. Pulls asset inventory, threat detections, and vulnerability assessments. Normalises to OCSF for unified visibility alongside IT telemetry.

Claroty ↗

Claroty CTD integration via both native API and replication approaches. Asset discovery, network baselining, and anomaly alerts normalised into the CAVERN detection pipeline.

Compare Caver

Honest side-by-side comparisons against the field. Pick a tab, the comparison expands inline: at-a-glance table, where the incumbent wins, where Caver wins, and how to decide.

Caver vs Splunk →

Honest side-by-side comparison: at-a-glance table, where they win, where Caver wins, how to decide.

Caver vs Elastic →

Honest side-by-side comparison: at-a-glance table, where they win, where Caver wins, how to decide.

Caver vs Microsoft Sentinel →

Honest side-by-side comparison: at-a-glance table, where they win, where Caver wins, how to decide.

Caver vs Sumo Logic →

Honest side-by-side comparison: at-a-glance table, where they win, where Caver wins, how to decide.

Caver vs Cribl Stream →

Honest side-by-side comparison: at-a-glance table, where they win, where Caver wins, how to decide.

Caver vs Wazuh →

Honest side-by-side comparison: at-a-glance table, where they win, where Caver wins, how to decide.

Caver vs Dragos →

Honest side-by-side comparison: at-a-glance table, where they win, where Caver wins, how to decide.

Caver vs Lakera →

Honest side-by-side comparison: at-a-glance table, where they win, where Caver wins, how to decide.

Caver vs Mindgard →

Honest side-by-side comparison: at-a-glance table, where they win, where Caver wins, how to decide.

Active comparison

Caver vs Splunk

An honest comparison of Caver against Splunk Enterprise and Splunk Cloud. What Splunk does well, where Caver wins, and how to decide.

Splunk is the platform most teams compare Caver against, because Splunk is the platform most teams are running today. This page is the honest comparison.

At a glance

	Splunk Enterprise / Cloud	Caver
License model	Ingest-per-day (Splunk Enterprise) or workload pricing (Splunk Cloud). Notoriously hard to predict.	Per-deployment, transparent license-key. No per-GB meter.
Cost trajectory at scale	Grows with telemetry volume.	Flat per deployment.
Cold-data search	Requires rehydration from frozen / archived buckets to a hot tier. Slow and expensive.	Native search over object storage. No rehydration.
Storage format	Splunk proprietary buckets.	Parquet, iceberg, and similar open formats.
Query languages	SPL.	SPL + KQL + SQL natively, all on the same backend with a language toggle. Plus AI agents over MCP, Grafana, DuckDB, Trino, and Athena against the same OCSF Parquet lake.
Forwarders	Universal Forwarder, per-agent licensing implications.	Pairs with caver-collector or your existing forwarders. No per-agent license.
Deploy time	Quarters for enterprise rollout.	Days for a working pilot.
App ecosystem	Splunkbase: largest catalog (thousands), 15+ years deep. Quality varies; many apps abandoned or shallow. Field normalization left to the operator.	Curated, meticulously authored. Each vendor pack includes dashboards, saved searches, data inputs, and OCSF field mappings. Daily updates. Ships with the product, no third-party install. Migrators auto-port Splunk dashboards, saved searches, ES correlation, ITSI, UBA, and SOAR playbooks.
Vendor lock-in	High (proprietary format, proprietary catalog).	Low (open storage formats, no proprietary catalog).
OT / ICS coverage	Splunk Industrial Asset Intelligence was deprecated in 2023. What’s left is community Splunkbase apps and the OT Security Add-on, both layered on the same general-purpose stack. No first-class industrial protocol decoding.	caver-industrial: passive deep-packet decoders for BACnet/IP, S7Comm, IEC 60870-5-104, DNP3, Modbus TCP, EtherNet/IP, OPC-UA. Framework alignment for NIST 800-82 + IEC 62443. Air-gap-friendly deploy. Curated industrial threat intel.
AI security visibility	Limited.	caver-aisec, purpose-built.

Strengths

Where Splunk wins	Where Caver wins
Ecosystem maturity. Splunkbase has been around for fifteen years. If there’s an obscure integration you need, someone has probably written it for Splunk.	Cost predictability. No quarterly ingest-license renegotiation. No “we ingested too much last month” surprises.
Enterprise sales motion. Splunk has the procurement story your CFO’s office already knows how to evaluate.	Cold data is just data. Years of telemetry, searchable at interactive speeds, without paying hot-tier storage cost or running a rehydration pipeline.
Operator pool. Hiring “a Splunk admin” is easier than hiring for any newer platform, just by candidate density.	No storage lock-in. Your data lives in object storage in standard formats. Your data engineering team can use the same data for non-SIEM purposes.
	Query however you want. SPL, KQL, and SQL natively on the same backend. AI agents over MCP. Grafana, DuckDB, Trino, Athena over the same Parquet lake. Splunk gives you SPL.
	Content packs that ship complete. Each Caver pack includes dashboards, saved searches, data inputs, and OCSF field mappings, daily-updated. Splunkbase apps vary widely in completeness and freshness.
	Drop-in compatibility. Add Caver as a search peer to your existing Splunk environment. Operators don’t change tools. SPL queries fan out to both.
	Faster pilots. A working Caver pilot is days of work, not the multi-quarter procurement-plus-implementation cycle.
	OT and AI surfaces. caver-industrial and caver-aisec are first-class extensions, not bolt-ons.

How to decide

If you have an existing Splunk investment you can’t justify ripping out, run Caver as a search peer alongside it. Use Caver for long-retention and cold-tier search; let Splunk continue to serve the workflows operators already know. The two work together.

If you’re greenfield, evaluate both. Splunk’s ecosystem maturity matters; Caver’s cost trajectory matters more if you expect telemetry volume to grow.

If you’ve already hit your Splunk license ceiling and the next true-up is the trigger for this conversation: that’s a great moment for Caver.

Talk to us about scoping — or read about how Caver works first.

Caver vs Elastic

Caver compared to Elasticsearch / Elastic Stack and Elastic Security. Open source pedigree vs. commercial focus, scale-engineering tax, and where each one fits.

At a glance

	Elastic Stack / Elastic Security	Caver
License model	Open source (Apache 2.0 / Elastic License v2) + commercial subscriptions.	Per-deployment commercial license-key.
Storage	Elasticsearch indices. ILM moves data through hot / warm / cold / frozen tiers.	Native parquet / iceberg on object storage.
Cold-tier search	Frozen tier requires searchable snapshots with a performance penalty.	First-class search over object storage.
Query languages	KQL, EQL, ES\|QL, Lucene.	SPL + KQL + SQL natively, all on the same backend with a language toggle. Plus AI agents over MCP, Grafana, DuckDB, Trino, Athena over the same OCSF Parquet lake. ES\|QL native on roadmap.
Operator pool	Broad open-source community.	Smaller, focused on commercial deployments.
Scale engineering	Your team owns shard sizing, ILM tuning, rolling restarts, version upgrades.	We own the operational complexity.
Content ecosystem	Elastic integrations catalog plus community-authored content. Quality varies; many integrations require operator tuning.	Curated vendor packs that ship with dashboards, saved searches, data inputs, and OCSF field mappings. Daily updates. No third-party install.
OT / ICS coverage	No first-class OT product. Beats can ingest industrial telemetry via custom processors and community-authored content, but no out-of-box industrial protocol decoders and no framework-aligned content.	caver-industrial: passive deep-packet decoders for BACnet/IP, S7Comm, IEC 60870-5-104, DNP3, Modbus TCP, EtherNet/IP, OPC-UA. Framework alignment for NIST 800-82 + IEC 62443. Air-gap-friendly deploy. Curated industrial threat intel.
AI security visibility	Limited.	caver-aisec, purpose-built.

Strengths

Where Elastic wins	Where Caver wins
Open source heritage. You can run Elasticsearch entirely under your own roof, no commercial relationship required.	No scale-engineering tax. Shard layout, ILM policy, version upgrades, rolling restarts: these are operator-burden line items at any non-trivial Elastic deployment. Caver doesn’t ship that burden to you.
Broad community. Decades of community-authored detection content, dashboards, integrations.	Cold-tier economics. Searchable snapshots are a real Elastic capability, but they pay a measurable performance penalty. Caver’s object-storage search doesn’t.
Flexibility. Elasticsearch is a general-purpose engine that happens to also do SIEM. If you need full-text search, geospatial queries, log analytics, and observability all on the same platform, that’s a real story.	Query however you want. SPL, KQL, and SQL natively on the same backend. AI agents over MCP, Grafana, DuckDB, Trino, Athena over the same Parquet lake. Operators coming from KQL keep their language; operators on SPL or SQL get theirs too.
No license-key gate. Run as many clusters as you like with no per-deployment commercial conversation.	Content packs that ship complete. Each Caver pack includes dashboards, saved searches, data inputs, and OCSF field mappings, daily-updated. Elastic integrations vary widely in depth and freshness.
	Purpose-built. Caver is SIEM-focused. Elasticsearch is general-purpose with a SIEM product layered on it. The differences show up at the edges (deployment hardening defaults, audit posture, retention guarantees).

How to decide

If you have strong Elasticsearch operators on staff and the cluster is already healthy, Elastic Security on top of it is a reasonable answer.

If you’re paying real money in operator time for cluster maintenance, version upgrades, or shard sizing, and you’d rather that time go elsewhere, Caver removes that line item.

If you need OT / ICS visibility, caver-industrial is in a different league than what’s available for Elastic.

Talk to us about scoping.

Caver vs Microsoft Sentinel

Caver compared to Microsoft Sentinel. Azure-bound SaaS SIEM with KQL as its native language vs Caver, which speaks KQL natively but doesn't lock you to Azure.

Microsoft Sentinel is the SIEM most cloud-native shops compare Caver against if they’re already running on Azure. The honest answer: if you’re all-in on Azure and the Microsoft ecosystem, Sentinel is hard to beat. If you’re not, or if Azure lock-in is a problem you want to avoid, Caver is the structural answer.

Worth noting up front: Caver speaks KQL natively. The Sentinel-migration story is real, not aspirational.

At a glance

	Microsoft Sentinel	Caver
Deployment model	Azure-only SaaS. Multi-tenant managed by Microsoft.	Self-hosted in your cloud, on-prem, or air-gapped. BYO Azure / AWS / GCP / on-prem.
License model	Azure ingest + retention pricing. Variable by Log Analytics workspace and commit tier.	Per-deployment commercial license-key. Flat. No per-GB meter.
Cost predictability	Variable. Ingest spikes cost real money.	Predictable. Flat per deployment.
Cold-tier search	Archived logs require restore to interactive tier. Slow and expensive.	First-class search over object storage. No rehydration.
Storage	Microsoft-managed Log Analytics + Azure storage. Opaque to you.	Native Parquet on object storage. Your bucket, your keys, open format.
Query languages	KQL only.	KQL natively (same operator surface: where, extend, summarize, join inner/leftouter/anti, let, parse, mv-expand, bin, case/iff, union). Plus SPL and SQL. Plus AI agents over MCP, Grafana, DuckDB, Trino, Athena.
Azure ecosystem integration	Deep. Defender for Cloud, Defender for Endpoint, Purview, Entra ID, Sentinel Notebooks.	Standalone. Integrates via APIs and standard log sources, but isn’t a Microsoft-blessed component.
SOAR	Logic Apps. Mature but Azure-bound.	SLAM. Built into Caver. Configuration-as-code playbooks, version-controlled, no separate Logic Apps subscription.
Content packs	Sentinel content hub (community + Microsoft + partner). Variable depth, Azure-flavored.	35+ vendor packs with bundled dashboards + data inputs + OCSF field mappings. Daily updates.
Threat intelligence	Microsoft Threat Intelligence + Sentinel TI connectors.	Curated industrial threat intel for caver-industrial; AI threat feeds (NIST AI 100-2, OWASP) for caver-aisec; built-in TI integration for the core.
OT / ICS coverage	Defender for IoT (separate product, separate license).	caver-industrial: passive decoders for BACnet/IP, S7Comm, IEC 60870-5-104, DNP3, Modbus TCP, EtherNet/IP, OPC-UA. NIST 800-82 + IEC 62443. Air-gap-friendly.
AI security visibility	Limited.	caver-aisec: prompt-injection detection, AI Observatory for LLM spend, NIST AI 100-2 + OWASP feeds.
Data residency	Azure regions only.	Your chosen environment, your storage account.
Air-gap deployment	Not supported.	Supported, including caver-industrial.
Migration tooling	Migration paths from Splunk, ArcSight, QRadar via Microsoft-published guides (manual).	caver-migrate ports dashboards, saved searches, ES correlation, ITSI, UBA, SOAR/Demisto playbooks. 9-of-9 migrator coverage tested end-to-end. KQL-native landing for Sentinel queries.

Strengths

Where Sentinel wins	Where Caver wins
Deep Azure integration. If your stack is Defender + Purview + Entra ID + Logic Apps + Sentinel, the integration story is unbeatable. Microsoft does this category as well as anyone.	Not locked to Azure. Multi-cloud, on-prem, hybrid, air-gapped, edge. Sentinel can’t follow.
Microsoft enterprise sales motion. Procurement, EA renegotiation, Microsoft credit consumption: all paths your finance team already understands.	Transparent licensing. Per-deployment flat. No ingest meter, no commit-tier negotiation, no quarterly Azure-bill surprise.
KQL is its native language. Operators who came up on Sentinel won’t switch query languages.	Query flexibility. KQL natively (so Sentinel queries land directly), plus SPL and SQL. Sentinel gives you KQL only.
Mature SOAR via Logic Apps. If you’re already a Logic Apps shop, the SOAR story is already familiar.	Cost predictability at scale. Azure ingest pricing punishes growth. Per-deployment pricing doesn’t.
Sentinel Notebooks for hunt-style investigations in Jupyter.	OT / ICS coverage. caver-industrial is first-class. Defender for IoT is a separate product with separate licensing and limited integration.
	AI security visibility. caver-aisec is first-class. Sentinel doesn’t have a comparable AI-runtime product.
	Data residency. Your storage account, your keys, open format. Sentinel’s storage is opaque.
	Air-gap and on-prem. Sentinel can’t run in either; Caver can.
	Open storage. Same OCSF Parquet data is queryable by Grafana, DuckDB, Trino, Athena, AI agents, your data engineering team for non-SIEM purposes. Sentinel’s storage is locked behind KQL only.
	Migration tooling. caver-migrate ports a Sentinel deployment in one command, including the KQL queries that land directly on Caver’s native KQL engine.

How to decide

Stay on Sentinel if: - You’re all-in on Azure and the Microsoft ecosystem. - Your security team’s depth is in KQL and the Defender stack. - Variable ingest pricing is something your finance team is OK negotiating quarterly. - You don’t need OT/ICS, AI security, multi-cloud, or air-gap deployment.

Move to Caver if: - You want to leave Azure-only or you’ve already left. - Your ingest is growing fast enough that the Azure bill has become an existential conversation. - You need OT/ICS, AI security, multi-cloud, or air-gapped deployment. - You want your data in open formats your data engineering team can also use. - Your operators speak SPL or want to add SQL alongside KQL.

Run both during a migration window if: - You have an existing Sentinel investment you can’t justify ripping out immediately. - Stand Caver up alongside, point new data sources at it, gradually move the KQL queries (they land natively). Decommission the Sentinel workspace when the bill is gone.

Talk to us about scoping — or read about caver-migrate for the Sentinel migration path.

Caver vs Sumo Logic

Caver compared to Sumo Logic. Cloud-native SIEM economics, multi-tenant SaaS tradeoffs, and where each fits.

At a glance

	Sumo Logic	Caver
Deployment model	Multi-tenant SaaS only.	Self-hosted in your cloud, on-prem, or air-gapped.
License model	Per-GB ingest + per-credit pricing.	Per-deployment commercial license-key.
Data residency	Sumo’s chosen regions.	Your chosen environment, your storage account.
Storage	Sumo’s.	Your object storage.
Cold-tier search	Continuous and frequent tiers; performance varies by tier.	Single object-storage tier, consistent performance.
Query languages	Sumo’s own query language plus LogReduce / LogCompare.	SPL + KQL + SQL natively, all on the same backend with a language toggle. Plus AI agents over MCP, Grafana, DuckDB, Trino, Athena over the same OCSF Parquet lake.
Content ecosystem	Sumo Apps catalog (vendor-published, varying depth).	Curated vendor packs that ship with dashboards, saved searches, data inputs, and OCSF field mappings. Daily updates. No third-party install.
Air-gap deployment	Not supported.	Supported, including for caver-industrial.
Custom integration cost	API integration.	Direct repo access in customer environment.
OT / ICS coverage	No first-class OT product. The multi-tenant SaaS deployment model is structurally incompatible with air-gapped industrial environments.	caver-industrial: passive deep-packet decoders for BACnet/IP, S7Comm, IEC 60870-5-104, DNP3, Modbus TCP, EtherNet/IP, OPC-UA. Framework alignment for NIST 800-82 + IEC 62443. Air-gap-friendly deploy. Curated industrial threat intel.
AI security visibility	Limited.	caver-aisec, purpose-built.

Strengths

Where Sumo Logic wins	Where Caver wins
Zero infrastructure operations. Sumo runs the platform. Your team writes queries and reads dashboards.	Data sovereignty. Your data stays in your environment. Important for regulated industries, government work, OT, and any organization with a “data does not leave our infrastructure” policy.
MSP and multi-tenant friendliness. Sumo has a strong MSP / managed-service motion built into the product.	Cost trajectory at scale. Per-GB ingest pricing punishes growth. Per-deployment pricing doesn’t.
Predictable SaaS economics if your volume is stable. When your ingest doesn’t grow much month-to-month, Sumo’s pricing is straightforward.	Air-gap and on-prem. Sumo cannot run inside an air-gapped industrial network. Caver can.
	Open storage format. Your data engineering team can use the same parquet / iceberg data for non-SIEM purposes. Sumo’s storage is opaque to you.
	Query however you want. SPL, KQL, and SQL natively on the same backend. AI agents over MCP, Grafana, DuckDB, Trino, Athena over the same Parquet lake. Sumo gives you Sumo’s query language.
	Content packs that ship complete. Each Caver pack includes dashboards, saved searches, data inputs, and OCSF field mappings, daily-updated.
	OT and AI surfaces. caver-industrial and caver-aisec have no Sumo equivalent.

How to decide

If you’re cloud-native, multi-tenant, and your data-residency requirements don’t matter, Sumo is a reasonable SaaS answer.

If you have regulatory, compliance, or operational reasons to keep data in your own environment, or if your ingest is growing fast enough that per-GB pricing has become an existential conversation, Caver is the structural answer.

If you have OT, ICS, or air-gapped requirements, Caver is the only one of the two that can actually deploy there.

Talk to us about scoping.

caver-collector vs Cribl Stream

A direct comparison between the caver-collector pre-index pipeline tier and Cribl Stream. Where the established player wins, where the integrated stack wins.

This comparison is specifically about caver-collector, the Caver-family pre-index pipeline component, against Cribl Stream. (For full SIEM-vs-SIEM comparisons, see vs Splunk, vs Elastic, or vs Sumo Logic.)

At a glance

	Cribl Stream	caver-collector
Position	Independent pipeline tier in front of any SIEM.	Pipeline tier integrated with the Caver storage and search stack (also runs standalone).
Underlying engine	Cribl’s purpose-built pipeline.	Vector + OpenTelemetry dual backend.
License model	Cribl commercial license.	Per-deployment license-key (or included with Caver).
Pipeline UI	Mature visual pipeline builder.	Configuration-as-code first; UI a secondary surface.
Routing	Multi-destination routing, broadly.	Multi-destination routing, broadly.
Transformation primitives	Cribl’s own pack catalog.	Vector + OTel native primitives plus Caver-specific manipulation. 14 new stateless transforms shipped last week (parse_csv, parse_kv, cast_field, hash_field, rename_field, coalesce, extract_timestamp, filter, mask_value, json_parse, field_extract, rate_limit, dedupe, and more).
Adapter / source ecosystem	Cribl Packs catalog plus vendor-published TAs.	60+ vendor adapters across two release cycles (Webex, Lacework, Mattermost, Buildkite, Discord, Meraki, CircleCI, Linode, MongoDB, and many more). Each ships with OCSF field mapping built in.
Industrial protocol decoding	Cribl Stream can route OT telemetry (syslog, raw TCP, custom inputs) but doesn’t decode industrial protocols natively. No first-class OT product story.	7 passive deep-packet decoders (BACnet/IP, S7Comm, IEC 60870-5-104, DNP3, Modbus TCP, EtherNet/IP, OPC-UA). Air-gap-friendly. Pairs with caver-industrial on the SIEM side for NIST 800-82 + IEC 62443 framework alignment.
Vendor independence	Vendor-neutral (works in front of any SIEM).	Vendor-neutral (works in front of any SIEM), with first-class integration into Caver.

Where Cribl wins

Maturity. Cribl has been the dominant independent pipeline tier for years. The UI is more polished. The pack catalog is broader. The operator pool is bigger.
Pure-play independence. Cribl explicitly positions as vendor-neutral. If neutrality from any single SIEM vendor is a procurement requirement, Cribl tells that story cleanly.
Visual pipeline builder. If your team prefers to drag-and-drop transformations rather than write configuration, Cribl’s UI is better.

Where caver-collector wins

Open-source engine pedigree. Vector and OpenTelemetry are both open-source, both mature, both have huge community ecosystems. Cribl’s engine is proprietary.
Integration with Caver. When paired with Caver, the storage, search, content, and pipeline tiers all share an operator surface. With Cribl + Caver, you get two separate operator surfaces.
Configuration-as-code first. GitOps-friendly. Version-controlled pipeline definitions. PR review on pipeline changes.
No additional commercial conversation if you already have Caver. caver-collector is included with Caver deployments. Cribl is a separate purchase.

How to decide

If you’ve already chosen Cribl and it’s working, there’s no urgent reason to replace it. Cribl + Caver is a valid combination; Caver doesn’t care what fronts it.

If you’re greenfield and considering both, evaluate the integrated-stack benefit of caver-collector + Caver against Cribl’s maturity advantage. For most teams, the integrated stack wins on operational complexity. For teams that need a long-term independent pipeline tier as a deliberate architectural choice, Cribl wins.

Talk to us about scoping.

Caver vs Wazuh

Caver compared to Wazuh (open-source SIEM/XDR). Why these two are a deliberate combo, not an either-or choice.

Wazuh is the dominant open-source SIEM/XDR. Free, Apache 2 licensed, deployed on roughly 25 million endpoints worldwide. The honest answer to “Caver vs Wazuh” is that for most teams, they’re a combo, not a choice. Wazuh is the endpoint-agent + assessment layer; Caver is the storage, search, analytics, AI-security, and OT layer behind it.

At a glance

	Wazuh	Caver
License model	Open source (Apache 2). Free agent + manager + indexer. Paid Wazuh Cloud for SaaS.	Per-deployment commercial license-key.
Endpoint agent	First-class: Windows, macOS, Linux, Solaris, AIX. Self-updating, signed, manageable.	None native. We recommend Wazuh agent + Caver as the analytics tier.
File Integrity Monitoring (FIM)	First-class via Wazuh agent syscheck module.	Inherited from Wazuh agent via the partnership integration; CAVERN detection content consumes the events.
Compliance modules	PCI DSS, HIPAA, NIST 800-53, GDPR mapped to rules out-of-box.	Same coverage via Wazuh agent + dedicated compliance mapping page on docs.etairos.ai (planned).
Configuration assessment	CIS benchmark scanning via Wazuh sca module.	Inherited from Wazuh agent.
Vulnerability scanning	Package-CVE matching via Wazuh vulnerability detector.	Inherited from Wazuh agent.
Storage backing	Wazuh indexer (OpenSearch fork). Same scale-engineering tax as Elastic.	Native Parquet on object storage. No shard sizing, no ILM tuning.
Query languages	OpenSearch DSL (Lucene).	SPL + KQL + SQL natively. AI agents over MCP, Grafana, DuckDB, Trino, Athena over the same Parquet lake.
Container / K8s security	Wazuh-Kubernetes integration via API audit.	Falco + Trivy partnership for first-class K8s coverage: runtime detection, image scanning, and policy violations all normalised into the CAVERN content pipeline.
OT / ICS coverage	Limited; no first-class industrial product.	caver-industrial: passive decoders for BACnet/IP, S7Comm, IEC 60870-5-104, DNP3, Modbus TCP, EtherNet/IP, OPC-UA. NIST 800-82 + IEC 62443. Air-gap-friendly.
AI security visibility	None.	caver-aisec: prompt-injection detection, AI Observatory for LLM spend, NIST AI 100-2 + OWASP feeds.
Content packs / integrations	Wazuh agent built-in modules plus community-contributed rules.	35+ vendor packs shipping with dashboards, saved searches, data inputs, and OCSF field mappings. Daily updates.
Cold-tier search	OpenSearch ISM moves data through hot/warm/cold/frozen with performance penalties.	First-class search over object storage. No rehydration.

Strengths

Where Wazuh wins	Where Caver wins
The endpoint agent itself. ~25M endpoints can’t be wrong. Mature, signed, self-updating, broad OS coverage.	The analytics tier. Object storage backing instead of OpenSearch indexer. No shard sizing, no ILM tuning, no version-upgrade pain.
Free OSS pricing for the agent + manager + indexer. If you can run the indexer yourself, the only cost is your operator time.	Query however you want. SPL, KQL, and SQL natively. Wazuh gives you OpenSearch DSL.
Established compliance posture. PCI DSS / HIPAA / NIST 800-53 mappings are mature and accepted by auditors.	AI security. caver-aisec has no Wazuh equivalent.
Active-response framework. Lightweight SOAR built into the agent (block IP, kill process, quarantine file).	OT / ICS. caver-industrial has no Wazuh equivalent.
Community + Wazuh Cloud. Real ecosystem, real managed-service option.	Daily-updated content packs with bundled dashboards + data inputs + OCSF field mapping. Wazuh’s vendor coverage is broad but variable in depth.
	Transparent commercial license-key on the analytics tier vs OSS+paid-cloud bimodal pricing.

How to decide

For most teams: deploy both. Run Wazuh agent on your endpoints. Land the events in Caver. Use Wazuh for endpoint coverage (FIM, CIS, CVE, OS-level events) and Caver for storage, search, AI security, OT/ICS, and cross-source correlation.

For shops already running Wazuh end-to-end (Wazuh agent + Wazuh manager + Wazuh indexer): Caver replaces just the indexer + analytics tier. Keep everything else.

For shops greenfield-evaluating SIEMs: Wazuh alone is a strong free starting point. The reasons to add Caver are scale economics, query-language flexibility, AI security, or OT/ICS coverage. None of those are urgent at small scale; all of them become urgent at scale.

For shops with regulated workloads where the “free OSS” answer doesn’t fly: Caver gives you the commercial backing, transparent licensing, and analytics tier that procurement can sign off on. Wazuh stays as the endpoint layer.

Talk to us about scoping.

Caver vs Dragos

Caver compared to Dragos Platform for OT/ICS buyers. Where Dragos's decade of OT focus wins, where unified IT plus OT in one stack wins.

Dragos is the established OT/ICS security platform. A decade of industrial focus, deep ICS-vendor relationships, well-known threat research (the WorldView intel, the CHERNOVITE / VOLTZITE attributions), and a large federal customer base. If you’re an OT-only buyer evaluating OT-only platforms, Dragos is the incumbent.

Caver answers a different procurement question: do you want one stack that covers IT and OT, or two stacks? caver-industrial extends the same Caver lakehouse SIEM with passive industrial-protocol decoders and OT-aware detection content, so OT and IT events land in one analytics tier with one query language.

At a glance

	Dragos Platform	Caver (caver-industrial + caver-collector)
Buyer focus	OT-only specialist.	IT and OT in one stack (convergence buyer).
Deployment model	Appliance and virtual sensor; vendor-managed and on-prem options.	On-prem lakehouse, air-gap friendly. caver-collector pipeline on the OT side, caver storage and analytics on the IT side.
Passive protocol coverage	Broad and mature: 25+ industrial protocols including DNP3, Modbus, IEC 60870-5-104, IEC 61850 GOOSE/SV, S7Comm, EtherNet/IP, OPC-UA, BACnet, plus vendor-specific dialects.	Active decoders for BACnet/IP, S7Comm, IEC 60870-5-104, DNP3, Modbus TCP, EtherNet/IP, OPC-UA. Roadmap aligns with the published industrial integration order.
Asset inventory	First-class, mature, with vendor-firmware mapping.	Native asset inventory built from passive decoder output and partner ingestion.
Industrial threat intel	WorldView intel program with named-threat attribution and quarterly releases.	Curated TI feeds focused on industrial CVEs and adversary TTPs. Updates daily through the Caver content pipeline rather than quarterly.
Framework alignment	NIST 800-82, IEC 62443, NERC CIP.	NIST 800-82, IEC 62443. Roadmap includes NERC CIP content.
IT-side coverage	Limited; integrates with IT SIEMs rather than being one.	Native. Caver is an IT SIEM that also covers OT. No second stack.
AI security visibility	None.	caver-aisec: prompt-injection detection, AI Observatory, NIST AI 100-2 + OWASP feeds.
Query languages	Platform-native UI and queries.	SPL, KQL, SQL natively against the Parquet lake. AI agents over MCP.
Pricing model	Enterprise procurement; per-asset and per-site licensing typical.	Transparent per-deployment license-key. Industrial pricing marketed Custom.
Update cadence	Quarterly platform releases plus intel updates.	Daily content updates through the pipeline.

Strengths

Where Dragos wins	Where Caver wins
A decade of OT-only focus. When the entire product roadmap is industrial, the depth shows. Protocol coverage, vendor-firmware mappings, and threat research are mature in a way a younger product cannot match.	One stack, not two. Caver is an IT SIEM that also handles OT. You don’t run separate query languages, separate dashboards, separate alerting tiers for the IT and OT halves of the same investigation.
Named-threat attribution. WorldView’s adversary profiles (CHERNOVITE, VOLTZITE, ELECTRUM, and others) carry weight with executive readers and federal buyers.	Transparent licensing. Per-deployment license-key with a published model on the Caver landing page. Dragos requires enterprise procurement.
Established federal and asset-owner base. If your procurement requires Dragos-by-name references in critical infrastructure, that’s a real moat.	Lakehouse economics. Caver stores everything as Parquet on object storage. The cost ceiling that keeps OT teams from keeping more than 90 days of historian data does not apply.
OT-vendor relationships. Deep integrations with Schneider, Siemens, GE, Honeywell, Rockwell, ABB built over years of co-engagement.	Native AI security. caver-aisec is part of the same stack. OT teams that are starting to deploy LLM-assisted maintenance copilots have one place to monitor that traffic.
OT-only specialization is sometimes the right answer. Some buyers do not want their OT analytics commingled with IT log volume; Dragos lets them buy an OT-only platform without that tradeoff.	Daily content updates. Caver’s content pipeline ships daily. Dragos’s platform releases are quarterly.
	Convergence is the trend. OT and IT are merging operationally (remote engineering access, cloud-connected historians, IT-IDS in the OT DMZ). Buyers who plan for the convergence buy convergence-native tools.

How to decide

For OT-only shops with a mature, separately-staffed OT security program and no plans to converge with IT: Dragos is the safer pick. A decade of OT focus is hard to argue with.

For shops where the same team owns IT and OT, or where the IT SOC is being asked to cover OT events as part of the convergence push: Caver removes a stack. One query language, one analytics tier, one license.

For shops doing a competitive bake-off: the honest comparison is depth-of-OT vs breadth-of-coverage. Dragos is deeper on OT specifically. Caver is broader across IT, OT, and AI security in one place. Map that to which procurement question is louder for you.

For shops where industrial pricing is a budget blocker: Caver’s per-deployment license-key is published, and industrial pricing is quoted Custom but built on the same transparent model rather than enterprise per-asset arithmetic.

Talk to us about scoping — or read about caver-industrial.

caver-aisec vs Lakera

caver-aisec compared to Lakera Guard. Why these two are complementary, not competitive, for most AI-security buyers.

caver-aisec and Lakera live in adjacent halves of the AI-security problem. Lakera is the inline guard at the prompt boundary. caver-aisec is the runtime detection + SOC correlation layer. For most teams the right answer is to use both, not pick one.

At a glance

	Lakera	caver-aisec
Posture	Inline guard. Intercepts prompts, scores, blocks or allows before the LLM call.	Runtime detection. Sees prompts after the LLM call, alerts, correlates with the rest of your security telemetry.
Latency profile	Sub-100ms; sits in the request path.	Asynchronous; doesn’t add latency to the LLM call itself.
Deployment	SaaS-first.	Self-hosted. On-prem, air-gapped, or cloud.
PII detection	Built-in PII detector with configurable redaction.	Shipped: Presidio integration with per-tenant policy and redaction.
Hallucination scoring	Faithfulness / groundedness scoring on RAG outputs.	Shipped: Ragas + TruLens integration with per-response scoring surfaced to SLAM notables.
Red-team testing	Lakera Red for adversarial probing.	Shipped: Garak + PyRIT scheduled probes with MITRE ATLAS coverage matrix.
Inline blocking	Lakera Guard, primary product.	Shipped: NeMo Guardrails + Rebuff runtime allow / deny / rewrite on the LLM-to-tool boundary.
SOC integration	Limited. AI-side product, not SOC-side.	First-class. AI events flow into Caver alongside identity, endpoint, network telemetry.
Cross-source correlation	None native.	Native. Prompt-injection attempt correlates with the IP, identity, endpoint, and tool-call activity around it.
AI Observatory / spend tracking	Limited.	First-class per-tenant LLM spend tracking with budget alerts.
Alert channels	Webhook + integrations.	PagerDuty, Discord, Teams, Slack, Telegram, SMTP, webhook.
Threat feeds	Lakera-curated.	NIST AI 100-2, OWASP Agentic AI Top 10, OWASP ML Top 10, HuggingFace Security, vendor advisories.
MCP tool-call audit	Not a focus.	First-class: LLM-to-MCP bridge instrumentation with CAVERN detection content.
Pricing	SaaS subscription.	Per-deployment commercial license-key.
Open source posture	Closed-source commercial.	Closed-source commercial. Built on open-source components (planned: NeMo Guardrails, Presidio, Ragas, Garak).

Where Lakera wins

Inline blocking is shipping today. Lakera Guard intercepts and blocks at the prompt boundary in production now. caver-aisec’s inline-block ticket is open but unshipped.
PII detection is shipping today. Same.
Lakera Red is shipping today. Pre-deploy adversarial test framework.
SaaS-first delivery. Fast pilots, zero infrastructure.
Established AI-security brand. Lakera holds mindshare in the inline-guard category.
Browser-side Chrome extension for shadow-AI catch is real and useful.

Where caver-aisec wins

SOC correlation. A prompt-injection attempt that touches identity, network, endpoint, and tool-call activity should show up in one queryable timeline. caver-aisec puts AI traffic next to the rest of your security telemetry; Lakera doesn’t.
AI Observatory. Per-tenant LLM spend tracking with budget alerts. Lakera doesn’t track spend.
Self-host / air-gap. Lakera is SaaS-first. For regulated industries, government, OT-adjacent, or air-gapped environments, Lakera can’t deploy. caver-aisec can.
Alert channel breadth. 7 named channels (PagerDuty, Discord, Teams, Slack, Telegram, SMTP, webhook) vs Lakera’s webhook + integrations.
Threat feed breadth. Multiple public AI-security feeds ingested as detection content, not just curated by the vendor.
MCP tool-call audit. Caver instruments the LLM-to-tool boundary specifically; useful for agentic AI deployments.

How to decide

Most teams: use both.

Lakera + caver-aisec is a deliberate combo. Lakera blocks at the prompt boundary. caver-aisec gives you SOC-side visibility into what was blocked, why, and what other telemetry correlates with the attempt. Same posture as a WAF + SIEM combo on the traditional web stack: the WAF blocks, the SIEM investigates.

Lakera-only is reasonable when: - You’re SaaS-only, no SOC, no regulated workloads. - AI security is your only security tool (small shop, AI-first product). - You need inline blocking today and can’t wait for caver-aisec parity (#24).

caver-aisec-only is reasonable when: - You need air-gap, on-prem, or data-residency reasons SaaS won’t satisfy. - You’re already running Caver as your SIEM and want AI security on the same operator surface. - You need OT-adjacent or industrial AI deployments where SaaS can’t deploy. - You want a single per-tenant spend + visibility surface, not just guarding.

Both is best when: - You want inline prevention + runtime detection + SOC correlation. - You’re at the scale where a missed prompt-injection costs more than two vendor relationships.

Talk to us about scoping.

caver-aisec vs Mindgard

caver-aisec compared to Mindgard. Pre-deployment red-teaming vs runtime detection, complementary not competitive.

caver-aisec and Mindgard solve different halves of the AI-security problem. Mindgard tests AI systems before deployment with automated red-teaming. caver-aisec detects attacks at runtime and correlates them with your SOC telemetry. Same defense-in-depth posture as a vulnerability scanner + EDR pairing on the traditional security stack: scanner finds the exposures, EDR catches the exploits.

At a glance

	Mindgard	caver-aisec
Posture	Pre-deployment + scheduled continuous testing. Probes target AI systems with adversarial inputs to find weaknesses.	Runtime detection. Observes deployed AI systems and detects attacks in flight.
Test catalog	Curated test pack covering jailbreak, prompt injection, model extraction, training-data leakage, supply chain.	Detection content pack for the same categories, but observed at runtime, not probed pre-deploy.
ATT&CK-for-ML mapping	Formal mapping of test coverage to MITRE ATLAS techniques.	Shipped: Garak + PyRIT integration with MITRE ATLAS technique coverage matrix and OWASP LLM / Agentic / ML Top 10 tagging across the detection corpus.
Deployment	SaaS platform.	Self-hosted. On-prem, air-gapped, or cloud.
Continuous testing	Scheduled probes against deployed AI endpoints.	Shipped: scheduled Garak + PyRIT probes against your deployed endpoints, with results flowing into Caver alongside runtime telemetry for cross-source correlation.
SOC integration	Limited. AI-testing product, not SOC-side.	First-class. AI events + test results flow into Caver alongside identity, endpoint, network telemetry.
Cross-source correlation	None native.	Native. Failed probe correlates with deployed model version, the prompt patterns that triggered it, and the rest of your security telemetry.
AI Observatory / spend tracking	Limited.	First-class per-tenant LLM spend tracking with budget alerts.
Alert channels	Email + integrations.	PagerDuty, Discord, Teams, Slack, Telegram, SMTP, webhook.
Threat feeds	Mindgard-curated test catalog.	NIST AI 100-2, OWASP Agentic AI Top 10, OWASP ML Top 10, HuggingFace Security, vendor advisories.
MCP tool-call audit	Not a focus.	First-class: LLM-to-MCP bridge instrumentation.
Pricing	SaaS subscription.	Per-deployment commercial license-key.

Where Mindgard wins

SaaS-first delivery. Fast pilots, zero infrastructure.
Established AI red-team brand. Mindgard holds mindshare in the pre-deploy testing category and ships a polished probe authoring UI.
Curated proprietary test catalog. Their in-house research team adds novel attacks the open Garak / PyRIT corpora may lag on.

Where caver-aisec wins

Pre-deploy probes + runtime detection in one stack. Shipped: Garak + PyRIT probes run on a schedule against your deployed AI endpoints, results land in Caver alongside runtime telemetry, the same operator surface investigates both. You don’t pick test-first vs detect-first, you get both.
Formal MITRE ATLAS coverage matrix. Shipped: per-technique coverage report driven by the Garak + PyRIT corpus + Caver’s hand-authored detection content, with OWASP LLM, Agentic, and ML Top 10 tagging cross-referenced.
Runtime detection + SOC correlation. A failed probe is a hypothesis; a successful exploit in production is the actual attack. caver-aisec catches the latter and correlates it with the rest of your telemetry.
AI Observatory. Per-tenant LLM spend tracking with budget alerts. Mindgard doesn’t track spend; testing is its product.
Self-host / air-gap. Mindgard is SaaS. For regulated industries, government, OT-adjacent, or air-gapped environments, Mindgard can’t deploy. caver-aisec can.
Alert channel breadth. 7 named channels vs email + integrations.
Threat feed breadth. Multiple public AI-security feeds ingested as runtime detection content, not just as testing inputs.
MCP tool-call audit. Runtime instrumentation of the LLM-to-tool boundary; useful for agentic AI deployments.

How to decide

Most teams: use caver-aisec.

caver-aisec now ships both halves: pre-deploy probes (Garak + PyRIT, scheduled, with MITRE ATLAS coverage) and runtime detection on the same operator surface. The same notable surface investigates both probe failures and live exploits. For a team building an AI security program from one stack, this is the simpler path.

Mindgard remains a good fit when: - Your AI security mandate is testing-only and you want a polished SaaS authoring UI. - You’re a security testing firm or red-team consultancy and runtime detection is your customer’s problem, not yours. - You don’t have a SOC and aren’t building one. - You specifically value Mindgard’s in-house proprietary test catalog over the open Garak + PyRIT corpora.

caver-aisec-only is reasonable when: - You need air-gap, on-prem, or data-residency reasons SaaS won’t satisfy. - You already have a SIEM and want AI security on the same operator surface. - Your AI systems are deployed and you need to know what’s happening to them now, not what could happen in a lab.

Both is best when: - You ship AI systems to production and want both prevention (testing) and detection (runtime). - You operate at the scale where a missed AI attack costs more than two vendor relationships.

Talk to us about scoping.

OCSF, design choices

Caver's storage layer is open OCSF (Open Cybersecurity Schema Framework) Parquet on any S3-compatible object store. This page explains what that means, why we picked it, and the other architectural calls we made along the way.

What is OCSF?

OCSF (Open Cybersecurity Schema Framework) is a vendor-neutral, open-source schema for security data. Originally announced by Splunk and AWS at Black Hat 2022 and now maintained at the OCSF Project on GitHub under the Linux Foundation, it has broad industry support: AWS, Splunk, IBM, CrowdStrike, Cloudflare, JupiterOne, Tanium, Zscaler, Salesforce, Okta, Trend Micro, DTEX, and dozens more.

OCSF defines a normalised event shape for every category of security telemetry: authentication, network activity, process activity, file activity, application activity, vulnerabilities, findings, configuration, audit. Every event has a stable schema with class_uid, category_uid, severity_id, activity_id, ATT&CK tagging, actor / target / source structures, and an OCSF-defined version field.

OCSF class	Class UID	Examples
Authentication	3002	Okta logon, AWS Console login, SSH auth
Network Activity	4001	Firewall connection, VPN tunnel, DNS query
Process Activity	1007	EDR process start, sysmon event 1
File Activity	1001	File write, file delete, file rename
Application Activity	6005	LLM API call, SaaS audit event
Security Finding	2004	CAVERN detection fire, Sigma rule match
Vulnerability Finding	2002	Tenable scan result, Wiz CVE finding
Cloud API	3005	CloudTrail event, Azure activity log

Why we chose OCSF

Vendor-neutral by design

OCSF was built by a coalition that includes both Caver's competitors and partners. No single vendor controls the schema. That means our customers are not locked into our format the way Splunk customers are locked into tsidx, or QRadar customers are locked into Ariel.

Already broadly emitted

AWS Security Lake, Splunk, CrowdStrike, Cloudflare, and many others emit OCSF natively. New sources arrive normalised, with no per-vendor mapping effort. Vendors not yet OCSF-native get a one-time normaliser in caver-collector.

SQL-friendly column shape

OCSF's tabular projection maps cleanly onto Parquet columns. Detection rules and analyst queries get column-level predicate pushdown, dictionary encoding for the six enum columns, and ten to one hundred times faster scan rates compared to row-store SIEMs on the same data volume.

ATT&CK and CIM-compatible

OCSF carries MITRE ATT&CK technique and tactic tags as first-class fields. Splunk CIM dashboards and Sigma rules port across cleanly. KQL queries written against Azure Sentinel run against OCSF with light field-name remap.

Why Parquet on object storage

The second half of the storage decision: store OCSF as Apache Parquet on S3 / MinIO / R2 / GCS, not in a proprietary index format.

Format	Vendor lock	Cold-storage cost	Cross-tool query	Caver pick
Splunk tsidx	Hard	High (replicated indexers)	None	No
Elastic shards	Hard	Medium	Limited	No
QRadar Ariel	Hard	High	None	No
OCSF Parquet on S3	None (Apache 2.0)	Lowest (~$0.023/GB)	DuckDB, Trino, Athena, Spark, Pandas, anything	Yes

Same files are queryable by Caver, by your data team via Athena or Snowflake external tables, and by your ML/AI team via PyArrow or Spark. The data lives once.

Other architectural choices

Choice	Caver pick	Why
Detection rule format	CAVERN YAML (Sigma-shaped)	Sigma is the community standard. CAVERN rules transpile cleanly from Sigma and back. Operators do not need to learn a proprietary DSL.
Query layer	Multi-language (SPL, SQL, KQL, Sigma, PromQL, NL)	Teams pick the language they already know. Federation across tools is a first-class capability, not a migration project.
Pipeline backend	Vector (Rust) + Python + OpenTelemetry, equally featured	No single backend optimises for every use case. Vector for hot paths, Python for SaaS API polling, OTel for orgs already running OTel infrastructure.
Schema enforcement	OCSF validators with coerce / reject / DLQ modes	Bad events go to a DLQ, not into the lake. Schema drift surfaces as a notable, not a silent corruption.
AI integration	MCP server, native primitive registry	AI agents are first-class operators, not chatbots bolted on. The same primitives the orchestrator uses are exposed via MCP for any client.
Tenancy	Per-tenant S3 prefix + row filter + LLM config	MSSPs and large enterprises get hard tenant isolation at the storage layer, not just at the query layer.
Identity	OIDC / SAML / LDAP, role-based access control with audit	No custom auth. Plug Caver into the IdP you already use.

External references

OCSF Schema Browser, browse every class and field
OCSF Project home, governance and contributors
OCSF on GitHub, schema source
Apache Parquet, columnar storage format
SigmaHQ/sigma, the community detection rule corpus

Deployment

Installation

# Core install
pip install caver

# Install with specific roles
pip install 'caver[cavern-engine,slam-engine,scheduler]'
pip install 'caver[search-head]'
pip install 'caver[mcp-server]'
pip install 'caver[all-roles]'  # dev / dogfood only

# Caver-Collector
pip install caver-collector
pip install 'caver-collector[parquet]'  # adds PyArrow for Parquet sink

Configuration (caver.toml)

[roles]
enabled = ["search-peer", "cavern-engine", "slam-engine", "scheduler"]

[storage]
bucket = "caver-lake"
endpoint = "http://localhost:9000"   # MinIO / LocalStack / R2
access_key_env = "MINIO_ACCESS_KEY"
secret_key_env = "MINIO_SECRET_KEY"

[cluster]
discovery = "static"
peers = ["localhost"]

[cavern]
content_path = "/etc/caver/content/v1"
rba.notable_threshold = 100

[slam]
oncall_telegram_chat_id = "YOUR_CHAT_ID"
telegram_token_env = "TELEGRAM_BOT_TOKEN"

[caver.ai]
user_daily_cost_usd = 50

Available roles

search-head

search-peer

indexer

scheduler

cavern-engine

slam-engine

echo-engine

uba-engine

mcp-server

aisec-runtime

collector

deployment-server

monitor

license-master

Splunk Peer Mode

Register Caver as a distributed search peer of your existing Splunk SH. No Splunk source or forwarder changes required. Caver receives SPL, scans the OCSF Parquet lake, and returns results over the splunkd wire format.

caver-cluster join   --splunk-host splunk.internal   --splunk-port 8089   --splunk-user admin   --splunk-pass "$SPLUNK_PASS"   --peer-uri https://caver.internal:8089

Any licensing change to your Splunk contract is made through Splunk per your existing agreement. Caver does not modify, patch, or bypass any Splunk license mechanism.

Health Check

caver-doctor --role search-peer
caver-doctor --role cavern-engine --config /etc/caver/caver.toml
# → exits 0 if healthy, 1 with structured error output if not

Docker / Kubernetes

docker run --rm -it \
  -e MINIO_ACCESS_KEY=minioadmin \
  -e MINIO_SECRET_KEY=minioadmin \
  -v /etc/caver:/etc/caver \
  ghcr.io/redeyesecurity/caver:latest \
  --config /etc/caver/caver.toml

# Helm
helm install caver oci://ghcr.io/redeyesecurity/charts/caver \
  --set storage.bucket=caver-lake \
  --set storage.endpoint=http://minio:9000

Configuration Reference

Storage

Key	Default	Description
`storage.bucket`	none	S3/MinIO bucket for OCSF lake
`storage.endpoint`	AWS S3	Override for MinIO / LocalStack / R2
`storage.prefix`	`""`	Key prefix for all lake writes
`storage.region`	`us-east-1`	AWS region

CAVERN

Key	Default	Description
`cavern.content_path`	bundled	Path to `v1/` content directory
`cavern.rba.entity_window_hours`	`24`	Rolling window for risk score aggregation
`cavern.rba.notable_threshold`	`100`	Score threshold to create a SLAM notable
`cavern.rba.admin_multiplier`	`2.0`	Score multiplier for admin accounts
`cavern.rba.critprod_multiplier`	`1.875`	Score multiplier for critical-prod assets

SLAM

Key	Default	Description
`slam.oncall_telegram_chat_id`	none	Telegram chat ID for oncall alerts
`slam.sla.p1_minutes`	`15`	P1 notable SLA in minutes
`slam.sla.p2_minutes`	`60`	P2 notable SLA in minutes
`slam.playbook_path`	bundled	Path to custom playbooks directory

AI config

Key	Default	Description
`caver.ai.user_daily_cost_usd`	`50`	Per-user AI spend threshold (USD) for `cost_anomaly` rule
`caver.ai.off_hours_start`	`22`	Off-hours start (local hour) for token spike rule
`caver.ai.off_hours_end`	`6`	Off-hours end (local hour)
`caver.ai.variance_model_count`	`5`	Model count threshold for `model_variance_burst`

Multi-tenancy (MSSP)

[[tenants]]
id = "acme"
display_name = "Acme Corp"
s3_prefix = "tenants/acme/"
row_filter = "metadata.tenant_uid = 'acme'"

[tenants.llm]
provider = "anthropic"
model = "claude-sonnet-4-6"
api_key_env = "ACME_ANTHROPIC_KEY"