OpenSearch

Full-text search and analytics engine with OpenSearch Dashboards. Serves as a centralized SIEM platform when the siem attribute is enabled, receiving security events from Falco, OPA, APISIX, Audit, Trivy, and OTEL sources across one or more stacks.

Architecture

OpenSearch Cluster - Data nodes (StatefulSet) for indexing and search
OpenSearch Dashboards - Visualization UI (Deployment)
SIEM Bootstrap Job - ArgoCD PostSync hook that imports index templates, dashboards, alerting monitors, and ISM policies
Prometheus Exporter - Metrics exporter for cluster health monitoring

Key Attributes

Attribute Example Description
namespace REQ opensearch Kubernetes namespace for all OpenSearch resources
admin_password REQ MyStr0ngP@ss Admin user password. Used by all sources (Falco, OPA, etc.) for authentication. Also used in the SIEM bootstrap job.
siem true Master gate for SIEM functionality. When "true", enables the entire SIEM pipeline: index templates, dashboards, alerting monitors, ISM policies, and the bootstrap job. When not set, inbound security links still populate dunders but no SIEM files are generated.
remote_host opensearch.sre.example.com Cross-stack endpoint. FQDN/IP for external access to this OpenSearch instance. Used by sources in other stacks (Falco, OPA, Trivy) to connect via APISIX route or load balancer on port 443 instead of internal svc on port 9200.

SIEM Platform

When siem: "true", OpenSearch becomes a centralized Security Information and Event Management (SIEM) platform. Sources from the same stack or external stacks send security events, and the bootstrap job configures everything automatically.

SIEM Data Flow

Falco (runtime threats) → Falcosidekick → OpenSearch (falco-* index)
OPA (policy violations) → OTEL sidecar → OpenSearch (opa* index)
APISIX (API gateway logs) → OTEL collector → OpenSearch (apisix-* index)
Audit (K8s audit logs) → OTEL collector → OpenSearch (audit-* index)
Trivy (vulnerability scans) → Trivy exporter → OpenSearch (trivy-* index)
OTEL (traces/logs/metrics) → OTEL collector → OpenSearch (otel-* index)

Inbound SIEM Links

Link Type Index Pattern Alerting Monitor Export Method
falco-opensearch falco-* falco-critical-events Falcosidekick (native output)
opa-opensearch opa* opa-violation-events OTEL sidecar (requires opa-otel link on OPA side)
apisix-opensearch apisix-* apisix-error-events OTEL collector
audit-opensearch audit-* audit-privileged-events OTEL collector
trivy-opensearch trivy-* trivy-critical-vulnerabilities Trivy exporter (direct to OpenSearch)
otel-opensearch otel-* OTEL collector

Per-Source SIEM Files

For each detected source link, the following files are generated (removed in post_gen if the link doesn't exist):

File Pattern Purpose
siem/index-template-{source}.json OpenSearch index template with field mappings for the source
siem/alerting-{source}.json OpenSearch alerting monitor definition (critical event detection)
siem/dashboards-{source}.ndjson OpenSearch Dashboards saved objects (index pattern + visualizations + dashboard)

Always-Generated SIEM Files

File Purpose
siem/ism-policy.json PCI DSS ISM retention policy (applied to all SIEM indices)
siem/dashboards-pci-360.ndjson Global PCI-360 Overview dashboard (cross-stack, fixed IDs)
siem/dashboards-pci-overview.ndjson Per-stack PCI Overview dashboard template
siem-bootstrap-job.yaml ArgoCD PostSync Job that imports all SIEM resources into OpenSearch on every sync

post_gen cleanup: If siem is not "true" or no SIEM source links exist, the entire k8s/deploy/base/siem/ directory and siem-bootstrap-job.yaml are removed. Per-source files (index-template, alerting, dashboard) are removed individually when their corresponding link doesn't exist.

Multi-Stack SIEM (Cross-Stack Sources)

Multiple stacks can send security events to a single centralized OpenSearch instance. Each external stack that links its Falco/OPA/APISIX/Audit/Trivy components creates a cross-stack connection.

How Cross-Stack Detection Works

pre_gen_project.py detects each inbound SIEM link:

if link.component.is_referenced:
    # External source from another stack
    stack_name = link.component.name.replace('-falco', '')
    # e.g., "prod-falco" → stack_name = "prod"
    __siem_stacks[stack_name].append("falco")
else:
    # Local source (same stack)
    __siem_local_sources.append("falco")

Stack-Aware Dashboard Generation

post_gen_project.py generates per-stack dashboard variants with prefixed IDs and titles to prevent collisions:

Stack Generated File Dashboard ID Title
sre (local) dashboards-falco-sre.ndjson sre-falco-pci-dashboard sre - PCI - Runtime
prod (external) dashboards-falco-prod.ndjson prod-falco-pci-dashboard prod - PCI - Runtime
staging (external) dashboards-opa-staging.ndjson staging-opa-pci-dashboard staging - PCI - Policy

No collision: Each stack's dashboards have unique IDs (prefixed with stack slug). The ?overwrite=true on import only overwrites the same stack's dashboards, not others. The global PCI-360 Overview has fixed IDs and is always safely overwritten.

SIEM Bootstrap Job

ArgoCD PostSync hook (siem-bootstrap-job.yaml) runs on every sync and performs:

Step Action API Endpoint
1 Wait for OpenSearch readiness (600s timeout) init container: curl /_cluster/health
2 Create ISM retention policy (pci-dss-retention) PUT /_plugins/_ism/policies/
3 Create index templates per source PUT /_index_template/{source}
4 Apply ISM policy to existing indices POST /_plugins/_ism/add/{source}-*
5 Create alerting monitors POST /_plugins/_alerting/monitors
6 Create placeholder indices ({source}-init) if no indices exist yet PUT /{source}-init
7 Cleanup: Delete only our previously imported saved objects (extracts IDs from ndjson files + legacy ID list). Customer dashboards are never touched. DELETE /api/saved_objects/{type}/{id}
8 Import dashboards (per-stack + global PCI-360) POST /api/saved_objects/_import?overwrite=true

All Inbound Links

Link Type Dunder Impact
prometheus-opensearch __prometheus Prometheus metrics exporter ServiceMonitor
falco-opensearch __falco SIEM: index template, alerting, dashboard for Falco events
opa-opensearch __opa SIEM: index template, alerting, dashboard for OPA violations
apisix-opensearch __apisix_logs SIEM: index template, alerting, dashboard for API gateway logs
audit-opensearch __audit SIEM: index template, alerting, dashboard for K8s audit logs
trivy-opensearch __trivy SIEM: index template, alerting, dashboard for vulnerability scans
otel-opensearch __otel SIEM: OTEL traces/logs/metrics ingestion
opensearch-bucket __bucket MinIO bucket for snapshot/backup storage

Connectivity: Internal vs External

All source components (Falco, OPA, Trivy, etc.) use the same pattern to connect to OpenSearch:

Internal (same stack/cluster):
  endpoint: https://opensearch-cluster-master.{namespace}.svc.cluster.local:9200
  tls: skip verify
  auth: admin / {admin_password}

External (cross-stack/cross-cluster):
  endpoint: https://{remote_host}:443
  tls: skip verify
  auth: admin / {admin_password}

Detection: link.component.is_external (true = external)

remote_host setup: Typically exposed via an APISIX route on the SRE/central cluster. The remote_host attribute is the FQDN that resolves to this route (e.g., opensearch.sre.stack-3.source-lab.io). Port 443 is used for HTTPS through the ingress controller.

Dashboard Update Procedure

How Dashboards Are Generated

Step Phase What Happens
1 Template Base ndjson files in siem/dashboards-{source}.ndjson define saved objects with fixed IDs
2 post_gen Copies base ndjson per linked stack, prefixes IDs with stack slug (e.g., falco-pci-dashboardpci-dss-stack-falco-pci-dashboard), renames titles, removes base files
3 Bootstrap Job Deletes only our previously imported saved objects (by exact ID), then imports per-stack ndjson files + PCI-360

Standard Update Flow

1. Edit base ndjson file(s) in templates/opensearch/.../siem/dashboards-*.ndjson
2. Regenerate stack (Stacktic build)
3. Push to git / ArgoCD sync
4. Bootstrap job runs automatically — cleans old objects, imports new ones

Why Cleanup Is Needed

When dashboard IDs change between template versions, ?overwrite=true only updates objects with the same ID — old IDs persist forever in .kibana_1. The bootstrap job now extracts all IDs from the ndjson files it will import (plus a hardcoded list of legacy IDs from older versions) and deletes only those exact objects before importing. Customer-created dashboards are never touched.

Dashboard Naming Convention

ID Pattern Example Source
{stack}-pci-runtime pci-dss-stack-pci-runtime Falco
{stack}-pci-api-gw pci-dss-stack-pci-api-gw APISIX
{stack}-pci-policy pci-dss-stack-pci-policy OPA
{stack}-pci-audit pci-dss-stack-pci-audit K8s Audit
{stack}-pci-scan pci-dss-stack-pci-scan Trivy
{stack}-pci-overview pci-dss-stack-pci-overview Per-stack overview
pci-360-overview pci-360-overview Global cross-stack (fixed ID)

Force Cleanup (Reset ISM + Delete SIEM Data)

Full Reset (Destructive — All Event Data Lost)

# 1. Remove ISM policy from all SIEM indices
curl -sk -X POST "https://localhost:9200/_plugins/_ism/remove/falco-*" ...
curl -sk -X POST "https://localhost:9200/_plugins/_ism/remove/opa*" ...
curl -sk -X POST "https://localhost:9200/_plugins/_ism/remove/apisix-*" ...
curl -sk -X POST "https://localhost:9200/_plugins/_ism/remove/audit-*" ...
curl -sk -X POST "https://localhost:9200/_plugins/_ism/remove/trivy-*" ...

# 2. Delete all SIEM indices
curl -sk -X DELETE "https://localhost:9200/falco-*" ...
curl -sk -X DELETE "https://localhost:9200/opa*" ...
curl -sk -X DELETE "https://localhost:9200/apisix-*" ...
curl -sk -X DELETE "https://localhost:9200/audit-*" ...
curl -sk -X DELETE "https://localhost:9200/trivy-*" ...

# 3. ArgoCD sync → bootstrap job recreates everything

Re-Apply ISM Only (Non-Destructive — Data Preserved)

# Remove then re-add ISM policy (per source)
curl -sk -X POST "https://localhost:9200/_plugins/_ism/remove/falco-*" ...
curl -sk -X POST "https://localhost:9200/_plugins/_ism/add/falco-*" -d '{"policy_id": "pci-dss-retention"}'

# Verify ISM state
curl -sk "https://localhost:9200/_plugins/_ism/explain/*?pretty" ...

ISM Lifecycle: hot (0–7d, rollover at 1d/30GB) → warm (7–90d, 1 replica, force-merged) → cold (90–365d, read-only, 0 replicas) → delete (365d+). New indices automatically pick up the ISM policy via ism_template patterns in the policy.

Ports

Port Service Protocol
9200 OpenSearch REST API (internal) HTTPS
9300 OpenSearch node-to-node transport HTTPS
5601 OpenSearch Dashboards UI HTTP
443 External access via APISIX route (cross-stack) HTTPS