Kubernetes Cost Optimization: How to Save 40–60% on Cloud Infrastructure in 2026¶
Kubernetes has become the standard for running enterprise applications. But with that comes a challenge many teams underestimate: cloud infrastructure costs are growing faster than productivity. The CNCF 2026 survey shows that the average organization overpays for Kubernetes infrastructure by 35–50% — solely due to misconfigured resource requests, unused compute, and the absence of a FinOps culture.
This article is a practical guide to eliminating these losses.
Where money disappears — anatomy of Kubernetes waste¶
Over-provisioning resource requests¶
The biggest source of waste. Developers set requests and limits conservatively because nobody wants their application to be OOMkilled. The result: average CPU utilization in a cluster is typically 15–25%, memory 40–60%.
# Kubernetes Cost Optimization: How to Save 40–60% on Cloud Infrastructure in 2026
resources:
requests:
cpu: "500m" # Reálně aplikace používá 50m
memory: "512Mi" # Reálně 120Mi
limits:
cpu: "2000m"
memory: "2Gi"
This pod occupies a billing slot for 500m CPU and 512Mi RAM — even though 90% of that is never used.
Idle namespaces and zombie workloads¶
Development and staging environments run 24/7, despite being active only 8 hours a day. Forgotten jobs, completed CronJobs with history, old ReplicaSets — you’re paying for all of it.
Suboptimal instance types¶
Running a memory-intensive workload on a compute-optimized instance (or vice versa) — you pay for capacity you can’t use.
Resource Optimization — concrete steps¶
1. Goldilocks — automated resource request recommendations¶
Goldilocks analyzes actual usage via VPA (Vertical Pod Autoscaler) and recommends the right values.
# Instalace
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks \
--namespace goldilocks \
--create-namespace
# Označit namespace pro analýzu
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
# Goldilocks dashboard
kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80
The dashboard shows recommended requests/limits for each deployment based on actual P50/P99 usage over the past N days.
2. VPA (Vertical Pod Autoscaler) in recommendation mode¶
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-service-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
updatePolicy:
updateMode: "Off" # Jen doporučení, neaplikuje automaticky
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
controlledResources: ["cpu", "memory"]
# Zobrazit doporučení
kubectl describe vpa api-service-vpa -n production
# Hledejte: "Target" hodnoty pro cpu a memory
3. HPA with custom metrics¶
Horizontal Pod Autoscaler based on custom business metrics (not just CPU):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-service-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 min před scale-down
policies:
- type: Percent
value: 25
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
4. KEDA for event-driven autoscaling¶
For workloads driven by queues (Kafka, SQS, Redis):
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker-scaledobject
namespace: production
spec:
scaleTargetRef:
name: queue-worker
minReplicaCount: 0 # Scale to zero!
maxReplicaCount: 50
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.eu-west-1.amazonaws.com/123456789/my-queue
queueLength: "10" # 1 replica per 10 messages
awsRegion: eu-west-1
Scale-to-zero is key — a worker with no messages = 0 pods = 0 cost.
Node Optimization¶
Spot/Preemptible instances with Karpenter¶
Karpenter is a modern node autoscaler from AWS that can intelligently mix spot and on-demand instances:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"] # ARM = levnější
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
nodeClassRef:
name: default
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
limits:
cpu: 1000
memory: 4000Gi
Karpenter consolidates nodes automatically — if 3 workloads sit on 3 nodes, it repacks them onto 1 node and shuts down the remaining 2.
Regular cleanup of idle nodes¶
#!/bin/bash
# Skript pro identifikaci nevyužitých nodů
kubectl get nodes -o json | jq -r '
.items[] |
select(.status.conditions[] | select(.type=="Ready" and .status=="True")) |
{
name: .metadata.name,
cpu_capacity: .status.capacity.cpu,
mem_capacity: .status.capacity.memory,
age: .metadata.creationTimestamp
}
' | jq -r '.name + " | CPU: " + .cpu_capacity + " | Age: " + .age'
# Zkontrolovat skutečné využití přes metrics-server
kubectl top nodes --sort-by=cpu | awk '$3+0 < 20 {print "LOW CPU:", $0}'
Namespace and Environment Cleanup¶
Automatic deletion of development environments¶
#!/usr/bin/env python3
"""Auto-cleanup idle development namespaces"""
import subprocess
import json
from datetime import datetime, timezone, timedelta
def get_namespace_last_activity(namespace: str) -> datetime:
"""Zjistí poslední aktivitu v namespace podle events"""
result = subprocess.run(
["kubectl", "get", "events", "-n", namespace,
"--sort-by=.lastTimestamp", "-o", "json"],
capture_output=True, text=True
)
events = json.loads(result.stdout)
if not events["items"]:
return datetime.min.replace(tzinfo=timezone.utc)
last_event = events["items"][-1]
timestamp = last_event["lastTimestamp"]
return datetime.fromisoformat(timestamp.replace("Z", "+00:00"))
def cleanup_idle_namespaces(max_idle_hours: int = 48):
"""Smaže namespaces s prefixem 'dev-' které jsou idle více než N hodin"""
result = subprocess.run(
["kubectl", "get", "namespaces", "-o", "json"],
capture_output=True, text=True
)
namespaces = json.loads(result.stdout)
now = datetime.now(timezone.utc)
cutoff = now - timedelta(hours=max_idle_hours)
for ns in namespaces["items"]:
name = ns["metadata"]["name"]
if not name.startswith("dev-"):
continue
last_activity = get_namespace_last_activity(name)
if last_activity < cutoff:
idle_hours = (now - last_activity).total_seconds() / 3600
print(f"Deleting idle namespace {name} (idle {idle_hours:.0f}h)")
subprocess.run(["kubectl", "delete", "namespace", name])
if __name__ == "__main__":
cleanup_idle_namespaces(max_idle_hours=48)
CronJob for overnight scale-down¶
# Scale staging na 0 replik přes noc
apiVersion: batch/v1
kind: CronJob
metadata:
name: staging-scale-down
namespace: staging
spec:
schedule: "0 20 * * 1-5" # Pondělí-Pátek 20:00
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler-sa
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=0 -n staging
kubectl scale statefulset --all --replicas=0 -n staging
restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: staging-scale-up
namespace: staging
spec:
schedule: "0 8 * * 1-5" # Pondělí-Pátek 8:00
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler-sa
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=2 -n staging
restartPolicy: OnFailure
FinOps — Kubecost and cost visibility¶
Kubecost for granular cost tracking¶
# Instalace Kubecost
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="your-token" \
--set prometheus.server.persistentVolume.size=50Gi
Kubecost lets you see costs per namespace, deployment, label, or team — essential for a chargeback model.
Cost allocation labels¶
# Každý workload musí mít cost labels
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
labels:
app: api-service
team: backend # Tým
cost-center: "CC-1042" # Nákladové středisko
environment: production
product: core-platform
# Kubecost query per team
curl "http://kubecost/model/allocation?window=30d&aggregate=label:team&idle=true" | \
jq '.data[0] | to_entries | sort_by(-.value.totalCost) |
.[] | "\(.key): $\(.value.totalCost | round)"'
Results — real numbers¶
After implementing these measures in a typical enterprise cluster:
| Optimization | Typical savings |
|---|---|
| Right-sizing requests (Goldilocks) | 20–30% |
| Spot instances (70% of workloads) | 60–70% on compute |
| Scale-to-zero for dev/staging | 40–60% on nonprod |
| Karpenter consolidation | 10–20% |
| Cleanup idle resources | 5–15% |
| Total | 40–60% of total costs |
Implementation plan¶
Week 1–2: Visibility - Deploy Kubecost or OpenCost - Add cost allocation labels to all workloads - Audit resource utilization via Goldilocks
Week 3–4: Quick wins - Right-size the top 20 most over-provisioned deployments - Enable scale-to-zero for dev/staging overnight - Clean up zombie workloads
Month 2: Automation - Karpenter or Cluster Autoscaler with spot pool - HPA/KEDA for key services - Automatic namespace cleanup
Month 3+: FinOps culture - Chargeback reports per team - Cost budgets and alerting - Quarterly reviews with development teams
Conclusion¶
Kubernetes cost optimization is not a one-time action — it’s a continuous process. Start with visibility (Kubecost), continue with right-sizing (Goldilocks + VPA), and automate scaling (HPA, KEDA, Karpenter). The result is infrastructure that grows with your needs, not despite them.
CORE SYSTEMS helps enterprise organizations implement FinOps culture and Kubernetes cost governance. Contact us for an audit of your infrastructure.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us