Every component in the LGTM stack can write to object storage. Loki stores compressed log chunks. Mimir writes metric blocks. Tempo lands trace data. Even our PostgreSQL backups go to S3 via Barman. If the observability stack is a pizza kitchen, object storage is the walk-in cooler β everything ends up here, and if it goes down, the kitchen stops cold.
When we started this project, the storage decision was straightforward. We already ran Nutanix Objects in both data centers for other workloads. It speaks S3. Loki and Mimir speak S3. The question was not whether to use Objects but how to architect it across two DCs for resilience without overcomplicating things.
This is article two in the LGTM on Nutanix series. Article 1 covered the architecture overview. This one goes deep on the storage foundation that everything else sits on, and article 3a covers the collection layer that feeds it.
Why Nutanix Objects
The decision came down to simplicity and single-vendor value. Nutanix gives us a full stack β Kubernetes compute, S3-compatible object storage, cross-DC replication, and a support contract that covers the whole thing. Using Objects means one fewer distributed system to babysit, and data sovereignty is automatic because nothing leaves our network.
That said, we are using standard S3 APIs. If we ever need to migrate to a different S3-compatible store, the Helm values change and nothing else does. No lock-in beyond a config file. Choosing a brand of walk-in cooler β they all keep food cold, the shelves are standard sizes, and you can swap brands later without redesigning the kitchen.
The Cross-DC Architecture
This is the part the standard documentation does not cover: how to architect object storage across two data centers for an observability stack that needs to survive losing a DC.
+---------------------------------------------------------------+
| F5 Global Load Balancer |
| https://objects.example.com |
| |
| Replicated buckets (backups, runner-cache) only |
+----------------+--------------------+-------------------------+
| |
v v
+----------------+-----+ +-------+------------------------+
| DC-A Objects | | DC-B Objects |
| dc-a-objects. | | dc-b-objects. |
| example.com | | example.com |
| | | |
| - Built-in LB | | - Built-in LB |
| - Direct DC writes | | - Direct DC writes |
| | | |
| Buckets: | | Buckets: |
| - loki-chunks-dc-a | | - loki-chunks-dc-b |
| - mimir-blocks-dc-a | | - mimir-blocks-dc-b |
| - tempo-traces-dc-a | | - tempo-traces-dc-b |
| | | |
| Shared (replicated):| | Shared (replicated): |
| - observability- |<====>| - observability- |
| backups | repl | backups |
| - runner-cache | | - runner-cache |
+----------------------+ +--------------------------------+
^ ^
| |
(writers) (writers)
DC-A DC-B
Loki, Mimir, Tempo Loki, Mimir, Tempo
Three endpoints in DNS:
DC-A Objects: https://dc-a-objects.example.com (per-DC writes)
DC-B Objects: https://dc-b-objects.example.com (per-DC writes)
Global LB: https://objects.example.com (F5 GSLB, replicated reads)
The F5 global load balancer fronts only the workloads that need cross-DC access. Nutanix Objects has its own built-in load balancer for local traffic within each DC, so for non-replicated data we point directly at the local DC endpoint. The F5 GSLB handles the shared backup buckets where either DC might need to read.
SSL Certificates
We replaced the default Nutanix self-signed certificates with properly signed certs that include the right Subject Alternative Names (SANs) for all three endpoints. This avoids TLS headaches in every application that talks to Objects β Loki, Mimir, Tempo, Barman, and the off-cluster Alloy agents all validate certs properly without any insecure_skip_verify hacks.
Get the certs right up front. It saves hours of debugging later, and infosec teams sleep better.
Bucket Architecture
Get this right on day one β migrating buckets later is painful, and renaming the toppings menu mid-rush ends with the wrong pizza going out the door. Our design follows a simple principle: data stays local to the DC that writes it; backups replicate across DCs.
Per-DC Buckets (No Cross-DC Replication)
| Bucket Pattern | Component | Purpose |
|---|---|---|
loki-chunks-{dc} | Loki | Compressed log data |
mimir-blocks-{dc} | Mimir | TSDB metric blocks |
tempo-traces-{dc} | Tempo | Trace data |
Each DC writes to its own buckets. DC-Aβs Loki writes to loki-chunks-dc-a, DC-Bβs Loki writes to loki-chunks-dc-b. No cross-DC replication on these. Why? Because Loki and Mimir already handle resilience through dual-write β both DCs ingest the same data independently, so each DC already has a full copy. Replicating the buckets would just double the storage for no benefit.
Shared Buckets (Cross-DC Replication Enabled)
| Bucket | Purpose | Replication |
|---|---|---|
observability-backups | ArgoCD exports, etcd snapshots, Postgres Barman backups, failover state | Async cross-DC |
runner-cache | GitHub Actions runner cache | Enabled |
The backup bucket is the safety net. If DC-A goes down, DC-B has the ArgoCD state, the Postgres backup, and the failover scripts needed to bring services up. Async replication is sufficient β we accept a small RPO on backup data in exchange for not paying the latency cost of synchronous replication on every write.
Why Not One Big Bucket?
Mimir enforces separate buckets for blocks, alertmanager state, and ruler config β it validates this at startup and refuses to run if you collapse them. Even if it did not, separating by component makes lifecycle policies cleaner, access controls simpler, and troubleshooting faster. When you are debugging a storage issue at 2 AM, you want to know immediately whether it is a Loki problem or a Mimir problem by looking at which bucket is affected.
Create the buckets via the AWS CLI pointed at your Objects endpoint:
export OBJECTS_ENDPOINT="https://dc-a-objects.example.com"
for bucket in loki-chunks-dc-a mimir-blocks-dc-a tempo-traces-dc-a; do
aws s3 mb "s3://${bucket}" \
--endpoint-url "${OBJECTS_ENDPOINT}" \
--profile nutanix-objects
done
Repeat for DC-B with the DC-B endpoint and bucket names.
Credential Management with External Secrets
We do not create Kubernetes secrets manually. Credentials live in Azure Key Vault and sync to the clusters automatically via External Secrets Operator (ESO).
The flow:
- Nutanix Objects access keys are generated in Prism Central (dedicated key pairs per service β Loki, Mimir, Tempo each get their own).
- Keys are stored in Azure Key Vault as secrets.
- ESO
ExternalSecretresources in each namespace pull from Key Vault and create Kubernetes Secrets. - Helm values reference the Kubernetes Secrets via
extraEnvFrom.
# externalsecret-loki.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: loki-s3-credentials
namespace: loki
spec:
refreshInterval: 1h
secretStoreRef:
name: azure-keyvault
kind: ClusterSecretStore
target:
name: loki-s3-credentials
data:
- secretKey: AWS_ACCESS_KEY_ID
remoteRef:
key: loki-s3-access-key
- secretKey: AWS_SECRET_ACCESS_KEY
remoteRef:
key: loki-s3-secret-key
This means credentials rotate without redeploying applications, Key Vault audit logs track every access, and no secrets ever land in Git. The same pattern applies to Mimir and Tempo β each gets its own ExternalSecret in its own namespace.
Configuring Loki for Nutanix Objects
The Helm values configuration for Loki against Nutanix Objects. We use the wrapper chart pattern described in article one β a local Chart.yaml wrapping the upstream community chart, with shared values plus per-DC overrides.
# values-common.yaml (shared across both DCs)
loki:
auth_enabled: false
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
storage:
type: s3
bucketNames:
chunks: loki-chunks # Overridden per-DC
s3:
endpoint: dc-a-objects.example.com # Overridden per-DC
region: us-east-1 # Required but arbitrary for non-AWS S3
s3ForcePathStyle: true
insecure: false
structuredConfig:
compactor:
working_directory: /var/loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
delete_request_store: s3
limits_config:
retention_period: 8760h # 365 days default
retention_stream:
- selector: '{job="kube-audit"}'
priority: 1
period: 2160h # 90 days
- selector: '{container="calico-node"}'
priority: 2
period: 720h # 30 days
- selector: '{job="loki.source.kubernetes.pod_logs"}'
priority: 3
period: 4320h # 180 days
- selector: '{job="windows_eventlog",level=~"Information|Verbose"}'
priority: 4
period: 2160h # 90 days
- selector: '{job="windows_eventlog",level=~"Error|Critical"}'
priority: 5
period: 8760h # 365 days
extraEnvFrom:
- secretRef:
name: loki-s3-credentials
deploymentMode: SimpleScalable
write:
replicas: 3
persistence:
size: 20Gi
storageClass: local-path
read:
replicas: 3
backend:
replicas: 2
persistence:
size: 20Gi
storageClass: local-path
minio:
enabled: false
The per-DC override files only change what differs:
# values-dc-a.yaml
loki:
storage:
bucketNames:
chunks: loki-chunks-dc-a
s3:
endpoint: dc-a-objects.example.com
# values-dc-b.yaml
loki:
storage:
bucketNames:
chunks: loki-chunks-dc-b
s3:
endpoint: dc-b-objects.example.com
Configuring Mimir for Nutanix Objects
Same pattern β shared values with per-DC overrides:
# values-common.yaml
mimir:
structuredConfig:
common:
storage:
backend: s3
s3:
endpoint: dc-a-objects.example.com # Overridden per-DC
region: us-east-1
insecure: false
s3_force_path_style: true
blocks_storage:
backend: s3
s3:
bucket_name: mimir-blocks-dc-a # Overridden per-DC
tsdb:
dir: /data/tsdb
alertmanager_storage:
backend: s3
s3:
bucket_name: mimir-alertmanager-dc-a # Overridden per-DC
ruler_storage:
backend: s3
s3:
bucket_name: mimir-ruler-dc-a # Overridden per-DC
compactor:
data_dir: /data/compactor
sharding_ring:
kvstore:
store: memberlist
limits:
compactor_blocks_retention_period: 8760h # 365 days
max_global_series_per_user: 5000000
ingestion_rate: 100000
ingestion_burst_size: 200000
max_global_exemplars_per_user: 100000
out_of_order_time_window: 5m
global:
extraEnvFrom:
- secretRef:
name: mimir-s3-credentials
ingester:
replicas: 3
persistentVolume:
size: 20Gi
storageClass: local-path
compactor:
replicas: 1
persistentVolume:
size: 50Gi
storageClass: local-path
store_gateway:
replicas: 2
persistentVolume:
size: 20Gi
storageClass: local-path
minio:
enabled: false
Key differences from Loki:
- Separate buckets are enforced. Mimir validates at startup that blocks, alertmanager, and ruler use different buckets.
- Series limits. We set
max_global_series_per_userto 5 million after hitting 98% of the original 1.5 million limit. Plan this before you onboard sources, not after the alerts start firing. - Out-of-order window. Set to 5 minutes to handle clock skew and multi-source writes. Without this, you will see
out of order sampleerrors. s3_force_path_styleβ note the underscores, not camelCase. Mimir uses a different S3 config format than Loki, which is exactly the kind of inconsistency that bites you at 2 AM.
The Dual-Retention Strategy
We run retention at two layers, and I would recommend the same to anyone building this:
Layer 1: Application retention. Lokiβs compactor and Mimirβs compactor handle data lifecycle based on configured retention periods. They understand schemas, indexes, and per-stream overrides. This is the primary mechanism.
Layer 2: S3 lifecycle policies. The safety net. We set bucket expiration to 380 days β 15 days longer than the 365-day application retention. If the compactor fails, falls behind, or has a bug, the S3 lifecycle policy ensures objects do not grow unbounded.
cat > /tmp/lifecycle-380d.json <<'EOF'
{
"Rules": [{
"ID": "expire-after-380-days",
"Status": "Enabled",
"Filter": { "Prefix": "" },
"Expiration": { "Days": 380 }
}, {
"ID": "abort-incomplete-multipart",
"Status": "Enabled",
"Filter": { "Prefix": "" },
"AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
}]
}
EOF
aws s3api put-bucket-lifecycle-configuration \
--bucket "loki-chunks-dc-a" \
--lifecycle-configuration file:///tmp/lifecycle-380d.json \
--endpoint-url "${OBJECTS_ENDPOINT}" \
--profile nutanix-objects
The 15-day buffer matters. Without it, a compactor that is slightly behind schedule could have its data deleted out from under it by the S3 lifecycle. That is a bad day. Two kitchen timers on the same batch of dough β one for when it should be done, one for when it must be done. The second one is your insurance policy.
Other bucket lifecycle policies:
| Bucket | App Retention | S3 Lifecycle |
|---|---|---|
loki-chunks-{dc} | 365 days (with per-stream overrides) | 380 days |
mimir-blocks-{dc} | 365 days | 380 days |
tempo-traces-{dc} | 30 days | 45 days |
observability-backups | Varies by content | 90 days |
runner-cache | Ephemeral | 7 days |
Monitoring the Storage Layer
A bucket at its quota or a degraded Objects deployment takes down your observability stack β the worst time to lose observability.
Nutanix Objects exposes Prometheus metrics through Prism Central. We scrape these with Alloy and send them to Mimir alongside everything else:
Object store metrics: https://<prism-central>:9440/oss/api/nutanix/metrics
Per-bucket metrics: https://<prism-central>:9440/oss/api/nutanix/metrics/<store>/<bucket>
Combine these with Lokiβs loki_compactor_* and Mimirβs cortex_compactor_* / cortex_bucket_store_* metrics in a single Grafana dashboard for full pipeline visibility. You want to know when the compactor is falling behind before your retention policy stops working.
The Meta-Alerts We Actually Run
These are the alerts that watch the storage layer (and the rest of the stack that depends on it). Drop them straight into a Mimir ruler config β they are the working set we use, not theory.
groups:
- name: meta-alerts
rules:
# S3 storage unreachable β also serves as an inhibition source
# in Alertmanager so we do not page on every downstream symptom.
- alert: S3StorageUnreachable
expr: |
(
sum by (dc) (rate(thanos_objstore_bucket_operation_failures_total[5m]))
/
sum by (dc) (rate(thanos_objstore_bucket_operations_total[5m]))
) > 0.5
and sum by (dc) (rate(thanos_objstore_bucket_operations_total[5m])) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "S3 storage unreachable in {{ $labels.dc }}"
# S3 write latency degraded
- alert: S3WriteLatencyHigh
expr: |
histogram_quantile(0.99, sum by (le) (
rate(thanos_objstore_bucket_operation_duration_seconds_bucket{operation=~"upload|attributes"}[5m])
)) > 5
for: 10m
labels:
severity: warning
annotations:
summary: "S3 write latency >5s (P99)"
# Mimir series approaching the configured limit
- alert: MimirHighCardinalitySeries
expr: |
(
sum(cortex_ingester_active_series) / 5000000
) > 0.8
for: 15m
labels:
severity: warning
annotations:
summary: "Mimir active series >80% of limit (5M)"
# Mimir ingester memory pressure β early OOM warning
- alert: MimirIngesterMemoryHigh
expr: |
(
container_memory_working_set_bytes{container="ingester", namespace="observability"}
/
kube_pod_container_resource_limits{resource="memory", container="ingester", namespace="observability"}
) > 0.8
for: 15m
labels:
severity: warning
annotations:
summary: "Mimir ingester memory >80% of limit"
# Loki ingestion rate drops >50% compared to 1h ago
- alert: LokiIngestionRateDrop
expr: |
(
sum(rate(loki_distributor_bytes_received_total[5m]))
/
sum(rate(loki_distributor_bytes_received_total[5m] offset 1h))
) < 0.5
for: 10m
labels:
severity: critical
annotations:
summary: "Loki ingestion rate dropped >50%"
# Alertmanager can't reach Teams or ServiceNow
- alert: AlertmanagerNotificationFailures
expr: rate(alertmanager_notifications_failed_total[5m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Alertmanager notification failures detected"
A few notes on why these are the ones we picked:
S3StorageUnreachableis an inhibition source. When S3 is down, Loki and Mimir will throw write errors all over the place. Inhibiting downstream alerts during an S3 outage prevents alert storms and keeps the on-call signal clear.MimirHighCardinalitySeriesat 80%. We sized the limit at 5M after hitting 98% on a 1.5M ceiling. The 80% threshold gives us weeks of warning instead of hours β enough time to trace which exporter introduced the explosion before we have to bump the limit again.MimirIngesterMemoryHighat 80%. Ingesters OOM is the worst-case scenario β you lose the WAL and any unflushed chunks. Catching memory pressure early is cheaper than recovering from a crash.LokiIngestionRateDropis rate-relative, not absolute. Absolute thresholds break when traffic naturally varies. A 50% drop compared to 1h ago is suspicious regardless of baseline.AlertmanagerNotificationFailuresis the alert about the alerting. If Teams and ServiceNow canβt be reached, every other alert is silent. This one needs to fire on a different channel β we use email-of-last-resort.
Verifying the Deployment
After ArgoCD syncs the wrapper charts, walk through these checks:
# 1. Loki and Mimir pods are running
kubectl -n loki get pods
kubectl -n mimir get pods
# 2. ExternalSecrets synced successfully
kubectl -n loki get externalsecret
kubectl -n mimir get externalsecret
# 3. S3 connectivity from inside the cluster
kubectl -n loki run s3-test --rm -it --image=amazon/aws-cli \
--env="AWS_ACCESS_KEY_ID=$(kubectl -n loki get secret loki-s3-credentials -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d)" \
--env="AWS_SECRET_ACCESS_KEY=$(kubectl -n loki get secret loki-s3-credentials -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)" \
-- s3 ls --endpoint-url https://dc-a-objects.example.com
# 4. Loki is writing chunks (look for objects appearing under a fake/<orgID>/ prefix)
aws s3 ls s3://loki-chunks-dc-a/ --endpoint-url https://dc-a-objects.example.com --recursive | head
# 5. Mimir blocks bucket has anonymous/ tenant prefix
aws s3 ls s3://mimir-blocks-dc-a/ --endpoint-url https://dc-a-objects.example.com --recursive | head
If chunks and blocks are landing, retention is enabled, and your alert rules cover the compactor health, you have a production-ready storage foundation.
S3 Compatibility Notes
For completeness, the S3 behaviors to be aware of with Nutanix Objects:
- Region is required but arbitrary. Use
us-east-1as a dummy value. Some SDK versions panic without it. - Lifecycle policies work well. Expiration and abort-incomplete-multipart are supported. Storage class transitions (Glacier-style tiering) are not, but that is irrelevant for observability workloads.
- S3 Select, IAM/STS, and bucket notifications have limited or no support. None are used by Loki, Mimir, or Tempo.
We hit zero S3 compatibility issues during setup. The key is getting the path-style access and region settings right from the start.
What We Learned
Honestly, the storage layer was the smoothest part of this project. Nutanix Objects version 5.3 just works for this use case. S3 has not been a bottleneck at any point. The setup followed the standard Nutanix documentation, and once we had SSL certs with proper SANs and the F5 load balancers configured, everything connected cleanly.
The real learning was not about S3 compatibility β it was about understanding the intricacies of Kubernetes and deploying components we had never run before. With a solid technical background, it was a learning experience but not a major lift. The architecture decisions (per-DC buckets, dual-retention, ESO for credentials) came from thinking through failure modes, not from hitting walls.
If I were giving advice to someone starting this:
- Get your certificates right first. valid SANs, no
insecure_skip_verifyanywhere. - Decide your bucket layout before you deploy. Migrating later is migration tax you do not need to pay.
- Run the dual-retention pattern from day one. App retention plus S3 lifecycle with a buffer. The buffer is the point.
- Use ESO from day one. Manual
kubectl create secretis technical debt you will pay back at the worst possible time. - Plan series and ingestion limits before onboarding sources. It is much easier to set them generous up front than to scramble when you hit 98%.
What Is Next
In the next article, we deploy Grafana Alloy β the collection agent that gets logs, metrics, and traces from your infrastructure into Loki, Mimir, and Tempo. That is where things get interesting, because we run three separate Alloy deployments (DaemonSet, network receiver, and traces receiver) and each has a different job. We will also cover why Telegraf still has a seat at the table alongside Alloy.
Happy automating!
This is article 2 of 10 in the LGTM on Nutanix series. Next up: Article 3a β Grafana Alloy on Kubernetes: Deployment.