Kubernetes Cost Optimization: How to Save 40–60% on Cloud Infrastructure in 2026¶

Kubernetes has become the standard for running enterprise applications. But with that comes a challenge many teams underestimate: cloud infrastructure costs are growing faster than productivity. The CNCF 2026 survey shows that the average organization overpays for Kubernetes infrastructure by 35–50% — solely due to misconfigured resource requests, unused compute, and the absence of a FinOps culture.

This article is a practical guide to eliminating these losses.

Where money disappears — anatomy of Kubernetes waste¶

Over-provisioning resource requests¶

The biggest source of waste. Developers set requests and limits conservatively because nobody wants their application to be OOMkilled. The result: average CPU utilization in a cluster is typically 15–25%, memory 40–60%.

# Kubernetes Cost Optimization: How to Save 40–60% on Cloud Infrastructure in 2026
resources:
  requests:
    cpu: "500m"    # Reálně aplikace používá 50m
    memory: "512Mi" # Reálně 120Mi
  limits:
    cpu: "2000m"
    memory: "2Gi"

This pod occupies a billing slot for 500m CPU and 512Mi RAM — even though 90% of that is never used.

Idle namespaces and zombie workloads¶

Development and staging environments run 24/7, despite being active only 8 hours a day. Forgotten jobs, completed CronJobs with history, old ReplicaSets — you’re paying for all of it.

Suboptimal instance types¶

Running a memory-intensive workload on a compute-optimized instance (or vice versa) — you pay for capacity you can’t use.

Resource Optimization — concrete steps¶

1. Goldilocks — automated resource request recommendations¶

Goldilocks analyzes actual usage via VPA (Vertical Pod Autoscaler) and recommends the right values.

# Instalace
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks \
  --create-namespace

# Označit namespace pro analýzu
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

# Goldilocks dashboard
kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80

The dashboard shows recommended requests/limits for each deployment based on actual P50/P99 usage over the past N days.

2. VPA (Vertical Pod Autoscaler) in recommendation mode¶

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-service-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  updatePolicy:
    updateMode: "Off"  # Jen doporučení, neaplikuje automaticky
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi
      controlledResources: ["cpu", "memory"]

# Zobrazit doporučení
kubectl describe vpa api-service-vpa -n production
# Hledejte: "Target" hodnoty pro cpu a memory

3. HPA with custom metrics¶

Horizontal Pod Autoscaler based on custom business metrics (not just CPU):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 5 min před scale-down
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

4. KEDA for event-driven autoscaling¶

For workloads driven by queues (Kafka, SQS, Redis):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaledobject
  namespace: production
spec:
  scaleTargetRef:
    name: queue-worker
  minReplicaCount: 0   # Scale to zero!
  maxReplicaCount: 50
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.eu-west-1.amazonaws.com/123456789/my-queue
      queueLength: "10"  # 1 replica per 10 messages
      awsRegion: eu-west-1

Scale-to-zero is key — a worker with no messages = 0 pods = 0 cost.

Node Optimization¶

Spot/Preemptible instances with Karpenter¶

Karpenter is a modern node autoscaler from AWS that can intelligently mix spot and on-demand instances:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot", "on-demand"]
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64", "arm64"]  # ARM = levnější
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values: ["c", "m", "r"]
      nodeClassRef:
        name: default
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
  limits:
    cpu: 1000
    memory: 4000Gi

Karpenter consolidates nodes automatically — if 3 workloads sit on 3 nodes, it repacks them onto 1 node and shuts down the remaining 2.

Regular cleanup of idle nodes¶

#!/bin/bash
# Skript pro identifikaci nevyužitých nodů

kubectl get nodes -o json | jq -r '
  .items[] |
  select(.status.conditions[] | select(.type=="Ready" and .status=="True")) |
  {
    name: .metadata.name,
    cpu_capacity: .status.capacity.cpu,
    mem_capacity: .status.capacity.memory,
    age: .metadata.creationTimestamp
  }
' | jq -r '.name + " | CPU: " + .cpu_capacity + " | Age: " + .age'

# Zkontrolovat skutečné využití přes metrics-server
kubectl top nodes --sort-by=cpu | awk '$3+0 < 20 {print "LOW CPU:", $0}'

Namespace and Environment Cleanup¶

Automatic deletion of development environments¶

#!/usr/bin/env python3
"""Auto-cleanup idle development namespaces"""
import subprocess
import json
from datetime import datetime, timezone, timedelta

def get_namespace_last_activity(namespace: str) -> datetime:
    """Zjistí poslední aktivitu v namespace podle events"""
    result = subprocess.run(
        ["kubectl", "get", "events", "-n", namespace,
         "--sort-by=.lastTimestamp", "-o", "json"],
        capture_output=True, text=True
    )
    events = json.loads(result.stdout)
    if not events["items"]:
        return datetime.min.replace(tzinfo=timezone.utc)

    last_event = events["items"][-1]
    timestamp = last_event["lastTimestamp"]
    return datetime.fromisoformat(timestamp.replace("Z", "+00:00"))

def cleanup_idle_namespaces(max_idle_hours: int = 48):
    """Smaže namespaces s prefixem 'dev-' které jsou idle více než N hodin"""
    result = subprocess.run(
        ["kubectl", "get", "namespaces", "-o", "json"],
        capture_output=True, text=True
    )
    namespaces = json.loads(result.stdout)

    now = datetime.now(timezone.utc)
    cutoff = now - timedelta(hours=max_idle_hours)

    for ns in namespaces["items"]:
        name = ns["metadata"]["name"]
        if not name.startswith("dev-"):
            continue

        last_activity = get_namespace_last_activity(name)
        if last_activity < cutoff:
            idle_hours = (now - last_activity).total_seconds() / 3600
            print(f"Deleting idle namespace {name} (idle {idle_hours:.0f}h)")
            subprocess.run(["kubectl", "delete", "namespace", name])

if __name__ == "__main__":
    cleanup_idle_namespaces(max_idle_hours=48)

CronJob for overnight scale-down¶

# Scale staging na 0 replik přes noc
apiVersion: batch/v1
kind: CronJob
metadata:
  name: staging-scale-down
  namespace: staging
spec:
  schedule: "0 20 * * 1-5"  # Pondělí-Pátek 20:00
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: scaler-sa
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment --all --replicas=0 -n staging
              kubectl scale statefulset --all --replicas=0 -n staging
          restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: staging-scale-up
  namespace: staging
spec:
  schedule: "0 8 * * 1-5"  # Pondělí-Pátek 8:00
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: scaler-sa
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment --all --replicas=2 -n staging
          restartPolicy: OnFailure

FinOps — Kubecost and cost visibility¶

Kubecost for granular cost tracking¶

# Instalace Kubecost
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="your-token" \
  --set prometheus.server.persistentVolume.size=50Gi

Kubecost lets you see costs per namespace, deployment, label, or team — essential for a chargeback model.

Cost allocation labels¶

# Každý workload musí mít cost labels
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  labels:
    app: api-service
    team: backend          # Tým
    cost-center: "CC-1042" # Nákladové středisko
    environment: production
    product: core-platform

# Kubecost query per team
curl "http://kubecost/model/allocation?window=30d&aggregate=label:team&idle=true" | \
  jq '.data[0] | to_entries | sort_by(-.value.totalCost) |
      .[] | "\(.key): $\(.value.totalCost | round)"'

Results — real numbers¶

After implementing these measures in a typical enterprise cluster:

Optimization	Typical savings
Right-sizing requests (Goldilocks)	20–30%
Spot instances (70% of workloads)	60–70% on compute
Scale-to-zero for dev/staging	40–60% on nonprod
Karpenter consolidation	10–20%
Cleanup idle resources	5–15%
Total	40–60% of total costs

Implementation plan¶

Week 1–2: Visibility - Deploy Kubecost or OpenCost - Add cost allocation labels to all workloads - Audit resource utilization via Goldilocks

Week 3–4: Quick wins - Right-size the top 20 most over-provisioned deployments - Enable scale-to-zero for dev/staging overnight - Clean up zombie workloads

Month 2: Automation - Karpenter or Cluster Autoscaler with spot pool - HPA/KEDA for key services - Automatic namespace cleanup

Month 3+: FinOps culture - Chargeback reports per team - Cost budgets and alerting - Quarterly reviews with development teams

Conclusion¶

Kubernetes cost optimization is not a one-time action — it’s a continuous process. Start with visibility (Kubecost), continue with right-sizing (Goldilocks + VPA), and automate scaling (HPA, KEDA, Karpenter). The result is infrastructure that grows with your needs, not despite them.

CORE SYSTEMS helps enterprise organizations implement FinOps culture and Kubernetes cost governance. Contact us for an audit of your infrastructure.

kubernetescost-optimizationfinopsclouddevopsenterprise

CORE SYSTEMS

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Kubernetes Cost Optimization: How to Save 40–60% on Cloud Infrastructure in 2026