Zum Inhalt springen
_CORE
KI & Agentensysteme Unternehmensinformationssysteme Cloud & Platform Engineering Datenplattform & Integration Sicherheit & Compliance QA, Testing & Observability IoT, Automatisierung & Robotik Mobile & Digitale Produkte Banken & Finanzen Versicherungen Öffentliche Verwaltung Verteidigung & Sicherheit Gesundheitswesen Energie & Versorgung Telko & Medien Industrie & Fertigung Logistik & E-Commerce Retail & Treueprogramme
Referenzen Technologien Blog Know-how Tools
Über uns Zusammenarbeit Karriere
CS EN DE
Lassen Sie uns sprechen

Transfer Learning - Nutzung vortrainierter Modelle

10. 08. 2024 4 Min. Lesezeit intermediate

Transfer Learning is a technique that allows you to leverage knowledge learned on one task to solve another, similar problem. Instead of training a model from scratch, you can use pre-trained models and adapt them to your specific needs.

Was ist Transfer Learning

Transfer Learning represents one of the most effective techniques in modern machine learning. Instead of training a model from scratch, we utilize knowledge already learned on large datasets and adapt it to our specific problem. This approach saves time, computational resources, and often achieves better results than classical learning from the beginning.

The basic idea is simple: a model that has learned to recognize general patterns in data (such as edges, textures, or linguistic structures) can apply this knowledge to related tasks. You just need to “fine-tune” the last layers for our specific domain.

Arten des Transfer Learning

We distinguish several main approaches:

  • Feature Extraction - freeze the weights of the pre-trained model and use it as a feature extractor
  • Fine-tuning - gradually unlock and retrain some layers on our data
  • Domain Adaptation - adapt the model to a new type of data (e.g., from photographs to drawings)

Feature Extraction in der Praxis

The simplest approach uses a pre-trained model as a black box for feature extraction:

import torch
import torchvision.models as models
from torch import nn

# Load pre-trained ResNet
base_model = models.resnet50(pretrained=True)

# Freeze all parameters
for param in base_model.parameters():
    param.requires_grad = False

# Replace classifier for our task (e.g., 10 classes)
base_model.fc = nn.Linear(base_model.fc.in_features, 10)

# Only the new layer will be trained
optimizer = torch.optim.Adam(base_model.fc.parameters(), lr=0.001)

Schrittweises Fine-tuning

A more sophisticated approach gradually “unlocks” layers for retraining:

class TransferModel(nn.Module):
    def __init__(self, num_classes, freeze_layers=True):
        super().__init__()
        self.backbone = models.resnet50(pretrained=True)

        if freeze_layers:
            # Freeze first layers
            for param in self.backbone.layer1.parameters():
                param.requires_grad = False
            for param in self.backbone.layer2.parameters():
                param.requires_grad = False

        # Modify classifier
        self.backbone.fc = nn.Linear(self.backbone.fc.in_features, num_classes)

    def unfreeze_layers(self, layer_names):
        """Gradual layer unfreezing"""
        for name in layer_names:
            layer = getattr(self.backbone, name)
            for param in layer.parameters():
                param.requires_grad = True

model = TransferModel(num_classes=10)

# After several epochs, we can unfreeze additional layers
model.unfreeze_layers(['layer2', 'layer3'])

Transfer Learning fuer NLP

In the field of natural language processing, transfer learning is even more important. Models like BERT, GPT, or RoBERTa are trained on massive text corpora and can capture complex linguistic patterns.

Fine-tuning von BERT fuer Klassifikation

from transformers import BertForSequenceClassification, BertTokenizer
from transformers import TrainingArguments, Trainer

# Load pre-trained BERT
model = BertForSequenceClassification.from_pretrained(
    'bert-base-multilingual-cased',
    num_labels=3  # E.g., sentiment: positive, negative, neutral
)

tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')

# Data preparation
def tokenize_function(examples):
    return tokenizer(
        examples['text'], 
        truncation=True, 
        padding=True, 
        max_length=512
    )

train_dataset = train_dataset.map(tokenize_function, batched=True)

# Training setup
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    learning_rate=2e-5,
    warmup_steps=500,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()

Bewaehrte Praktiken

Auswahl der Learning Rate

When fine-tuning, it’s crucial to properly set the learning rate. Generally:

  • For new layers: higher learning rate (1e-3 to 1e-4)
  • For pre-trained layers: lower learning rate (1e-5 to 1e-6)
  • Gradual reduction with continued training
# Differentiated learning rate for different parts of the model
def get_optimizer_grouped_parameters(model, backbone_lr=1e-5, head_lr=1e-3):
    no_decay = ["bias", "LayerNorm.weight"]

    optimizer_grouped_parameters = [
        {
            "params": [p for n, p in model.backbone.named_parameters() 
                      if not any(nd in n for nd in no_decay)],
            "weight_decay": 0.01,
            "lr": backbone_lr
        },
        {
            "params": [p for n, p in model.backbone.named_parameters() 
                      if any(nd in n for nd in no_decay)],
            "weight_decay": 0.0,
            "lr": backbone_lr
        },
        {
            "params": model.fc.parameters(),
            "lr": head_lr
        }
    ]

    return torch.optim.AdamW(optimizer_grouped_parameters)

Data Augmentation und Regularisierung

With smaller datasets, it’s important to prevent overfitting:

import torchvision.transforms as transforms

# Augmentation for computer vision
transform = transforms.Compose([
    transforms.RandomRotation(15),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                        std=[0.229, 0.224, 0.225])
])

# Dropout in custom layers
class FineTunedModel(nn.Module):
    def __init__(self, base_model, num_classes):
        super().__init__()
        self.backbone = base_model
        self.dropout = nn.Dropout(0.3)
        self.classifier = nn.Linear(base_model.fc.in_features, num_classes)

    def forward(self, x):
        features = self.backbone.features(x)
        pooled = nn.AdaptiveAvgPool2d((1, 1))(features)
        flattened = torch.flatten(pooled, 1)
        dropped = self.dropout(flattened)
        return self.classifier(dropped)

Praktische Tipps fuer erfolgreichen Transfer

Domain similarity: The more similar the source and target domains are, the better results we can expect. A model trained on general photographs will adapt better to medical images than to satellite data.

Dataset size: For small datasets (hundreds of samples), feature extraction is a safer choice. For larger datasets (thousands of samples), we can experiment with fine-tuning.

Gradual unfreezing: Instead of immediately unlocking all layers, we gradually unfreeze from top to bottom layers:

def gradual_unfreeze_schedule(model, epoch):
    """Gradual unfreezing by epoch"""
    if epoch >= 5:
        # From epoch 5, unfreeze top layers
        for param in model.backbone.layer4.parameters():
            param.requires_grad = True

    if epoch >= 10:
        # From epoch 10, additional layers
        for param in model.backbone.layer3.parameters():
            param.requires_grad = True

    # Learning rate is also gradually reduced
    if epoch >= 5:
        for param_group in optimizer.param_groups:
            param_group['lr'] *= 0.5

Zusammenfassung

Transfer Learning stellt eine grundlegende Veraenderung im Ansatz des maschinellen Lernens dar. Anstatt Modelle von Grund auf zu trainieren, nutzen wir kollektives “Wissen”, das in vortrainierten Modellen gespeichert ist. Der Schluessel zum Erfolg ist die richtige Wahl der Strategie (Feature Extraction vs. Fine-tuning), die sorgfaeltige Einstellung der Learning Rates fuer verschiedene Teile des Modells und ein schrittweiser Ansatz beim Auftauen der Schichten. Mit dem Wachstum vortrainierter Modelle wie Foundation Models wird Transfer Learning zu einem noch wichtigeren Werkzeug fuer die effiziente Entwicklung von KI-Anwendungen.

transfer learningpre-trainingfine-tuning
Teilen:

CORE SYSTEMS Team

Wir bauen Kernsysteme und KI-Agenten, die den Betrieb am Laufen halten. 15 Jahre Erfahrung mit Enterprise-IT.