Skip to content
_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN DE
Let's talk

MLOps Pipeline in Production: From Experiment to Reliable ML System

20. 02. 2026 6 min read CORE SYSTEMSai
MLOps Pipeline in Production: From Experiment to Reliable ML System

MLOps Pipeline in Production: From Experiment to Reliable ML System

Most ML projects fail not because of a bad model, but because of poor infrastructure around it. 87% of ML models never make it to production (Gartner, 2025). MLOps is the discipline that solves this problem.

Anatomy of a Production ML Pipeline

┌────────────┐   ┌──────────────┐   ┌─────────────┐   ┌──────────────┐
│  Data       │──▶│  Feature      │──▶│  Training    │──▶│  Model       │
│  Ingestion  │   │  Engineering  │   │  Pipeline    │   │  Registry    │
└────────────┘   └──────────────┘   └─────────────┘   └──────────────┘
                        │                                       │
                        ▼                                       ▼
                 ┌──────────────┐                       ┌──────────────┐
                 │  Feature      │                       │  Serving      │
                 │  Store        │──────────────────────▶│  (API/Batch)  │
                 └──────────────┘                       └──────────────┘
                                                               │
                                                               ▼
                                                        ┌──────────────┐
                                                        │  Monitoring   │
                                                        │  & Drift      │
                                                        └──────────────┘

1. Feature Store — The Heart of ML Infrastructure

A feature store is a central repository for ML features — cleaned, transformed data attributes ready for both training and inference.

Why a Feature Store?

  • Consistency: the same features in training and production (eliminating training-serving skew)
  • Reusability: features shared across teams and models
  • Temporal correctness: point-in-time correct joins (no data leakage)
  • Latency: online store for real-time serving (< 10ms)

Implementation

# MLOps Pipeline in Production: From Experiment to Reliable ML System
from feast import Entity, Feature, FeatureView, FileSource
from feast.types import Float32, Int64

customer = Entity(name="customer_id", value_type=Int64)

customer_features = FeatureView(
    name="customer_features",
    entities=[customer],
    schema=[
        Feature(name="total_orders_30d", dtype=Int64),
        Feature(name="avg_order_value", dtype=Float32),
        Feature(name="days_since_last_order", dtype=Int64),
        Feature(name="churn_risk_score", dtype=Float32),
    ],
    source=FileSource(path="s3://features/customer_daily.parquet"),
    ttl=timedelta(days=1),
)

Recommended tools (2026):

Tool Type Best for
Feast Open-source Startups, flexibility
Tecton Managed Enterprise, real-time
Hopsworks Open-source + managed Full MLOps platform
Databricks Feature Store Managed Databricks ecosystem
Redis + custom DIY Ultra-low latency

2. Training Pipeline — Reproducibility First

Every training run must be 100% reproducible:

# DVC pipeline (dvc.yaml)
stages:
  prepare:
    cmd: python src/prepare.py
    deps:
      - src/prepare.py
      - data/raw/
    outs:
      - data/processed/
    params:
      - prepare.split_ratio
      - prepare.seed

  train:
    cmd: python src/train.py
    deps:
      - src/train.py
      - data/processed/
    outs:
      - models/latest/
    params:
      - train.learning_rate
      - train.epochs
      - train.batch_size
    metrics:
      - metrics/train.json:
          cache: false

  evaluate:
    cmd: python src/evaluate.py
    deps:
      - src/evaluate.py
      - models/latest/
      - data/processed/test/
    metrics:
      - metrics/eval.json:
          cache: false
    plots:
      - metrics/confusion_matrix.csv
      - metrics/roc_curve.csv

Key Principles

  1. Data versioning — DVC, LakeFS, or Delta Lake
  2. Code versioning — Git (obviously)
  3. Environment versioning — Docker + requirements.txt with pinned versions
  4. Experiment tracking — MLflow, Weights & Biases, or Neptune
  5. Hyperparameter management — Hydra or config YAML in Git
# MLflow experiment tracking
import mlflow

with mlflow.start_run(run_name="xgboost-v3"):
    mlflow.log_params({
        "learning_rate": 0.01,
        "max_depth": 6,
        "n_estimators": 500,
        "feature_set": "customer_v3",
    })

    model = train_model(X_train, y_train, params)
    metrics = evaluate_model(model, X_test, y_test)

    mlflow.log_metrics({
        "auc_roc": metrics["auc_roc"],
        "precision": metrics["precision"],
        "recall": metrics["recall"],
        "f1": metrics["f1"],
    })

    mlflow.sklearn.log_model(model, "model",
        registered_model_name="churn-predictor")

3. Model Registry — Governance and Lifecycle

Model registry = a central catalog of all models with versions, metadata, and lifecycle state.

┌──────────────────────────────────────┐
│         Model Registry               │
├──────────────────────────────────────┤
│ churn-predictor                      │
│   v1.0 → Archived                   │
│   v1.1 → Archived                   │
│   v2.0 → Production (since 2026-01) │
│   v2.1 → Staging (canary 5%)        │
│                                      │
│ fraud-detector                       │
│   v3.0 → Production                 │
│   v3.1 → Staging                    │
│                                      │
│ recommendation-engine                │
│   v1.0 → Production                 │
└──────────────────────────────────────┘

Lifecycle states: None → Staging → Production → Archived

Governance checklist before Production: - ✅ Metrics above threshold (AUC > 0.85, latency < 50ms) - ✅ A/B test for at least 7 days - ✅ Bias audit (fairness metrics) - ✅ Data lineage documentation - ✅ Rollback plan tested

4. Model Serving — API and Batch

Online serving (real-time)

# FastAPI + ONNX Runtime pro nízkou latenci
from fastapi import FastAPI
import onnxruntime as ort
import numpy as np

app = FastAPI()
session = ort.InferenceSession("model.onnx")

@app.post("/predict")
async def predict(features: dict):
    input_array = np.array([list(features.values())], dtype=np.float32)
    result = session.run(None, {"input": input_array})
    return {
        "prediction": int(result[0][0]),
        "probability": float(result[1][0][1]),
        "model_version": "v2.0",
    }

Latency optimization: - ONNX Runtime (2-5x faster than sklearn/pytorch) - Model quantization (FP32 → INT8) - Batching (dynamic micro-batching) - Feature caching (Redis for repeated queries)

Batch serving

# Spark batch inference
from pyspark.sql import SparkSession
import mlflow

spark = SparkSession.builder.appName("batch-predict").getOrCreate()
model = mlflow.pyfunc.spark_udf(spark, "models:/churn-predictor/Production")

df = spark.read.parquet("s3://features/customer_daily/")
predictions = df.withColumn("churn_probability", model(*feature_columns))
predictions.write.parquet("s3://predictions/churn/2026-02-20/")

5. Monitoring — Model Decay Is Inevitable

ML models degrade. Data changes, the world changes. Monitoring is not a nice-to-have, it is a necessity.

What to Monitor

Category Metrics Alert
Data quality Missing values, schema drift, volume > 5% missing
Feature drift PSI, KS test, Wasserstein distance PSI > 0.2
Prediction drift Distribution of outputs KS p < 0.01
Model performance AUC, precision, recall (with ground truth) AUC drop > 5%
Latency p50, p95, p99 p99 > 100ms
Throughput Requests/sec, error rate Error > 1%

Drift detection

# Population Stability Index (PSI)
def psi(expected: np.ndarray, actual: np.ndarray, bins: int = 10) -> float:
    """PSI < 0.1 = stable, 0.1-0.2 = moderate, > 0.2 = significant drift."""
    breakpoints = np.quantile(expected, np.linspace(0, 1, bins + 1))
    expected_pct = np.histogram(expected, breakpoints)[0] / len(expected)
    actual_pct = np.histogram(actual, breakpoints)[0] / len(actual)

    # Avoid log(0)
    expected_pct = np.clip(expected_pct, 0.001, None)
    actual_pct = np.clip(actual_pct, 0.001, None)

    return np.sum((actual_pct - expected_pct) * np.log(actual_pct / expected_pct))

Retraining Triggers

  1. Scheduled: weekly/monthly retrain on fresh data
  2. Drift-based: automatic retrain when PSI > 0.2
  3. Performance-based: retrain when a metric drops below threshold
  4. Event-based: retrain after a significant business change

6. CI/CD for ML

# GitHub Actions — ML pipeline CI/CD
name: ML Pipeline
on:
  push:
    paths: ['src/**', 'configs/**', 'data/dvc.lock']

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements.txt
      - run: pytest tests/ -v
      - run: python src/validate_data.py
      - run: python src/train.py --config configs/ci.yaml
      - run: python src/evaluate.py --threshold-file configs/thresholds.yaml

  deploy-staging:
    needs: test
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - run: mlflow models serve -m "models:/churn-predictor/Staging" --port 8080 &
      - run: python tests/integration/test_serving.py
      - run: kubectl apply -f k8s/staging/

  promote-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production  # Manual approval gate
    steps:
      - run: python scripts/promote_model.py --from staging --to production
      - run: kubectl apply -f k8s/production/

Conclusion

MLOps in 2026 is not about tools — it is about discipline. The key principles:

  1. Reproducibility — every experiment must be repeatable
  2. Automation — manual steps = source of errors
  3. Monitoring — a model without monitoring is a ticking time bomb
  4. Governance — who approved the model, on what data, with what metrics

CORE SYSTEMS implements MLOps pipelines from architecture design to production operations. We help companies move ML models from Jupyter to production — reliably and securely.


Want to implement MLOps in your organization? Contact us for a consultation.

mlopsmachine-learningpipelineproductionmonitoringfeature-store
Share:

CORE SYSTEMS

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Contact us