Prometheus

Metrics collection, alerting, and multi-stack federation. Operates as either Master (central receiver) or Remote (forwarding writer). Auto-generates PrometheusRule alert manifests per linked component with cluster label injection for remote instances.

Architecture

Prometheus Server - Metrics collection + TSDB storage (:9090)
Alertmanager - Alert routing + notification dispatch (:9093)
Operator - CRD controller for Prometheus, Alertmanager, ServiceMonitor, PrometheusRule
Node Exporter - Host-level metrics (CPU, memory, disk, network) (:9100)
Kube State Metrics - Kubernetes object state metrics (optional)
PushGateway - Push-based metric ingestion for batch jobs (optional, :9091)

Multi-Stack: Master vs Remote

The prometheus_master attribute controls how this Prometheus instance operates. When you have multiple stacks, one Prometheus acts as the Master and receives metrics from all other stacks' Remote instances.

How multi-stack federation works:
1. Each stack has its own Prometheus. One is set as Master (prometheus_master: true), the rest are Remote.
2. Remote instances link to the Master via prometheus-prometheus (cross-stack, is_external).
3. The Remote automatically adds cluster="{system_name}" label to all scraped metrics via scrapeClasses — this tags every metric with its stack origin.
4. The Remote sends all metrics to the Master via remoteWrite (using Master's remote_host + link's api_key + TLS skip verify).
5. The Remote sends alerts to the Master's Alertmanager (using Master's remote_alerts URL) instead of its own.
6. In post-gen, all alert PromQL expressions on Remote instances get cluster="{system_name}" injected so alerts filter by stack.
Behavior Master (true) Remote (false)
Remote Write Receiver Enabled — accepts incoming metrics Disabled — sends metrics outbound
TSDB Out-of-Order outOfOrderTimeWindow: 1h (handles clock skew) Not configured
scrapeClasses None — no cluster labeling on local metrics cluster="{system_name}" added to all scraped metrics
remoteWrite Not configured Sends to Master's remote_host with API key + TLS
ruleSelector {} — matches ALL PrometheusRules Scoped to own release label
Alertmanager Target Local Alertmanager Master's Alertmanager (remote_alerts)
Alert PromQL Injection None — expressions unchanged cluster="{system_name}" injected into all alert PromQL (post-gen)
Alerts Directory Not included in kustomization Included if any alert link exists
Example: Stack "sre" has a Master Prometheus. Stack "dev" and "staging" each have Remote Prometheus instances linked to "sre-prometheus" via prometheus-prometheus. The dev/staging instances add cluster="dev" / cluster="staging" to all metrics and forward everything to sre's Prometheus. All alerts route to sre's Alertmanager.

Attributes

Attribute Example Description
namespace REQ prometheus Kubernetes namespace for all Prometheus resources
prometheus_master REQ true / false Master vs Remote mode — controls remoteWrite, scrapeClasses, ruleSelector, alert injection (see above)
retention_time 15d TSDB data retention period
retention_size 40GB TSDB data retention size limit
storage_size 50Gi PersistentVolumeClaim size for TSDB storage
push_gateway true / false Deploys PushGateway + ServiceMonitor (pushgateway.yaml, :9091)
resource_profile medium Selects resource patch: small, medium, large, x-large
kube_state_metrics_enabled true / false Enables kube-state-metrics deployment and ServiceMonitor
chart_version_prometheus 11.0.2 kube-prometheus-stack Helm chart version
remote_host REQ https://prom.sre.io/api/v1/write Master only — Required for external federation. URL where Remote instances send metrics via remoteWrite. If empty, no remoteWrite is configured on Remote instances even if linked.
remote_alerts REQ https://alerts.sre.io Master only — Required for external federation. Alertmanager URL where Remote instances send alerts. If empty, Remote uses its own local Alertmanager.
smtp_host smtp.example.com:587 SMTP host for Alertmanager email notifications (stored in alert.env)
from alerts@example.com SMTP sender address for alert emails
smtp_user alert-user SMTP username for authentication
smtp_password secret SMTP password for authentication
slack_api https://hooks.slack.com/... Slack webhook URL for alert notifications
configmap key=value Custom entries appended to alert.env configmap

Links

Link Type Direction What It Automates
prometheus-prometheus Outbound (is_external) Federation link — Remote sends remoteWrite + alerts to Master across clusters.
Auto-configures: remoteWrite URL, API key auth, TLS, queue config, additionalAlertManagerConfigs.
Link attribute: api_key — authentication for remote write
prometheus-apisix Outbound Generates alerts/apisix-alerts.yaml + enables ServiceMonitor on APISIX side
prometheus-cert_manager Outbound Generates alerts/cert-mger-alert.yaml — Certificate expiry and renewal alerts
prometheus-elasticsearch Outbound Generates alerts/elasticsearch-alerts.yaml — Cluster health and indexing alerts
prometheus-keycloak_operator Outbound Generates alerts/keycloak-alerts.yaml — Keycloak availability alerts
prometheus-loki Outbound Generates alerts/loki-alerts.yaml — Ingestion rate and error alerts
prometheus-mongodb Outbound Generates alerts/mongo-alerts.yaml + mongo-operator.yaml — Replication, connection, operator alerts
prometheus-opa Outbound Generates alerts/opa.yaml — OPA Gatekeeper policy violation alerts
prometheus-rabbitmq Outbound Generates alerts/rabbitmq-alers.yaml — Queue depth and memory alerts
prometheus-qdrant Outbound Generates alerts/qdrant-alerts.yaml — Collection and memory alerts
prometheus-nextcloud Outbound Generates alerts/nextcloud-alerts.yaml — Nextcloud availability alerts
Note: Each prometheus-{component} outbound link generates a PrometheusRule alert file on the Prometheus side and enables a ServiceMonitor on the target component's side. Alert files are removed in post-gen if the link doesn't exist. The entire alerts/ directory is excluded from kustomization when prometheus_master: true or when no alert links exist.

Generated Files

File Condition Contains
k8s/deploy/base/namespace.yaml Always Namespace
k8s/deploy/base/kustomization.yaml Always Resources, secretGenerator (alert.env), resource_profile patch, conditional alerts/ directory
k8s/deploy/base/pushgateway.yaml push_gateway: true PushGateway Deployment + Service + ServiceMonitor (:9091)
k8s/deploy/base/patch/resource-{profile}.yaml Per resource_profile CPU/memory limits for Prometheus, Alertmanager, Operator (small/medium/large/xlarge)
k8s/deploy/base/alerts/kustomization.yaml NOT master AND any alert link Kustomize listing of active alert files (conditional per link)
k8s/deploy/base/alerts/apisix-alerts.yaml prometheus-apisix linked APISIX PrometheusRule alerts
k8s/deploy/base/alerts/cert-mger-alert.yaml prometheus-cert_manager linked Certificate expiry and renewal alerts
k8s/deploy/base/alerts/elasticsearch-alerts.yaml prometheus-elasticsearch linked Cluster health and indexing alerts
k8s/deploy/base/alerts/keycloak-alerts.yaml prometheus-keycloak_operator linked Keycloak availability alerts
k8s/deploy/base/alerts/loki-alerts.yaml prometheus-loki linked Ingestion rate and error alerts
k8s/deploy/base/alerts/mongo-alerts.yaml prometheus-mongodb linked MongoDB replication and connection alerts
k8s/deploy/base/alerts/mongo-operator.yaml prometheus-mongodb linked MongoDB operator alerts
k8s/deploy/base/alerts/opa.yaml prometheus-opa linked OPA Gatekeeper policy violation alerts
k8s/deploy/base/alerts/rabbitmq-alers.yaml prometheus-rabbitmq linked Queue depth and memory alerts
k8s/deploy/base/alerts/qdrant-alerts.yaml prometheus-qdrant linked Collection and memory alerts
k8s/deploy/base/alerts/nextcloud-alerts.yaml prometheus-nextcloud linked Nextcloud availability alerts
k8s/deploy/base/secret/alert.env Always SMTP + Slack credentials for Alertmanager notifications + custom configmap entries
k8s/deploy/base/secret/minio.env Always SOPS Remote write credentials (SOPS encrypted)
helm/helm-values.yaml Always kube-prometheus-stack Helm values: prometheusSpec (scrapeClasses, remoteWrite, ruleSelector, storage), alertmanager, operator, nodeExporter, kubeStateMetrics, additionalAlertManagerConfigs
helm/generate-yaml.sh Always Helm template render script → outputs prometheus.yaml