Zum Inhalt springen
_CORE
KI & Agentensysteme Unternehmensinformationssysteme Cloud & Platform Engineering Datenplattform & Integration Sicherheit & Compliance QA, Testing & Observability IoT, Automatisierung & Robotik Mobile & Digitale Produkte Banken & Finanzen Versicherungen Öffentliche Verwaltung Verteidigung & Sicherheit Gesundheitswesen Energie & Versorgung Telko & Medien Industrie & Fertigung Logistik & E-Commerce Retail & Treueprogramme
Referenzen Technologien Blog Know-how Tools
Über uns Zusammenarbeit Karriere
CS EN DE
Lassen Sie uns sprechen

Memory pro AI agenty

18. 01. 2026 4 Min. Lesezeit intermediate

Memory ist eine Schluesselkomponente moderner KI-Agenten und LLM-Modelle. It enables them to retain context from previous interactions and use acquired information for better decision-making.

Memory for AI Agents: Key to Intelligent Behavior

AI agents achieve true usefulness only when they can remember the context of previous interactions and gradually build knowledge. Without memory mechanisms, every conversation is isolated and the agent always starts from zero. In this article, we’ll show how to implement different types of memory for AI agents and what challenges this brings.

Types of Memory for AI Agents

Memory systems for AI agents can be divided into several categories based on time horizon and information storage methods:

Short-term Memory

The simplest form of memory is maintaining context during one conversation. Modern LLMs have limited context window size, so we must implement strategies for efficient token management:

class ShortTermMemory:
    def __init__(self, max_tokens: int = 4000):
        self.messages: List[Dict] = []
        self.max_tokens = max_tokens

    def add_message(self, role: str, content: str):
        message = {"role": role, "content": content}
        self.messages.append(message)
        self._trim_if_needed()

    def _trim_if_needed(self):
        # Simple strategy - remove oldest messages
        current_tokens = self._count_tokens()
        while current_tokens > self.max_tokens and len(self.messages) > 2:
            self.messages.pop(1)  # Keep system message
            current_tokens = self._count_tokens()

    def get_context(self) -> List[Dict]:
        return self.messages.copy()

Long-term Memory

For truly intelligent behavior, we need to store information across sessions. This uses a combination of vector databases and structured storage:

import chromadb
from sentence_transformers import SentenceTransformer

class LongTermMemory:
    def __init__(self, collection_name: str = "agent_memory"):
        self.client = chromadb.Client()
        self.collection = self.client.create_collection(collection_name)
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')

    def store_memory(self, content: str, metadata: Dict = None):
        # Create embedding
        embedding = self.encoder.encode([content])[0].tolist()

        # Store in vector DB
        self.collection.add(
            embeddings=[embedding],
            documents=[content],
            metadatas=[metadata or {}],
            ids=[str(hash(content))]
        )

    def retrieve_relevant_memories(self, query: str, n_results: int = 3):
        query_embedding = self.encoder.encode([query])[0].tolist()

        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=n_results
        )

        return results['documents'][0] if results['documents'] else []

RAG (Retrieval-Augmented Generation) Integration

Memory systems are most commonly implemented using the RAG pattern, where the agent searches for relevant information from its memory before generating a response:

class MemoryEnhancedAgent:
    def __init__(self, llm_client, memory_system):
        self.llm = llm_client
        self.memory = memory_system

    async def process_query(self, user_input: str) -> str:
        # 1. Search for relevant memories
        relevant_memories = self.memory.retrieve_relevant_memories(
            user_input, n_results=5
        )

        # 2. Build context with memory
        context = self._build_context_with_memory(
            user_input, 
            relevant_memories
        )

        # 3. Generate response
        response = await self.llm.generate(context)

        # 4. Store interaction in memory
        self.memory.store_memory(
            f"User: {user_input}\nAssistant: {response}",
            metadata={
                "timestamp": datetime.now().isoformat(),
                "type": "conversation"
            }
        )

        return response

    def _build_context_with_memory(self, query: str, memories: List[str]) -> str:
        memory_context = "\n".join([f"Memory: {mem}" for mem in memories])

        return f"""
        Relevant memories from previous interactions:
        {memory_context}

        Current user query: {query}

        Please respond based on the context and memories above.
        """

Hierarchical Memory Structure

For more complex applications, we can implement hierarchical memory structure where different types of information have different priorities and storage methods:

class HierarchicalMemory:
    def __init__(self):
        self.episodic_memory = LongTermMemory("episodic")  # Specific events
        self.semantic_memory = LongTermMemory("semantic")  # General knowledge
        self.procedural_memory = {}  # Learned procedures

    def store_interaction(self, interaction_data: Dict):
        # Episodic memory - specific interactions
        self.episodic_memory.store_memory(
            interaction_data['content'],
            {**interaction_data['metadata'], 'type': 'episodic'}
        )

        # Extract and store general knowledge
        facts = self._extract_facts(interaction_data['content'])
        for fact in facts:
            self.semantic_memory.store_memory(
                fact,
                {'type': 'semantic', 'confidence': 0.8}
            )

    def retrieve_context(self, query: str) -> Dict:
        episodic = self.episodic_memory.retrieve_relevant_memories(query, 3)
        semantic = self.semantic_memory.retrieve_relevant_memories(query, 2)

        return {
            'episodic_memories': episodic,
            'semantic_knowledge': semantic,
            'procedures': self._find_relevant_procedures(query)
        }

    def _extract_facts(self, content: str) -> List[str]:
        # Here would be fact extraction implementation using NLP
        # For simplicity, return placeholder
        return []

    def _find_relevant_procedures(self, query: str) -> List[str]:
        # Search in procedural memory
        return []

Optimization and Challenges

Implementing memory for AI agents brings several technical challenges:

Memory Size Management

Memory can quickly grow to unmanageable dimensions. We need to implement strategies for cleanup and archiving:

class MemoryManager:
    def __init__(self, memory_system, max_memories: int = 10000):
        self.memory = memory_system
        self.max_memories = max_memories

    def cleanup_old_memories(self, retention_days: int = 30):
        cutoff_date = datetime.now() - timedelta(days=retention_days)

        # Implementation depends on specific database
        # Example for Chroma DB with metadata filter
        old_memories = self.memory.collection.get(
            where={"timestamp": {"<": cutoff_date.isoformat()}}
        )

        if len(old_memories['ids']) > 0:
            self.memory.collection.delete(ids=old_memories['ids'])

    def compress_memories(self):
        # Summarize old memories using LLM
        # Preserve only key information
        pass

Consistency and Updates

Memory can contain outdated or conflicting information. We need mechanisms for updates and conflict resolution:

class ConsistentMemory(LongTermMemory):
    def update_memory(self, old_content: str, new_content: str):
        # Find similar memories
        similar = self.retrieve_relevant_memories(old_content, n_results=10)

        # Mark as outdated and add new version
        for memory in similar:
            if self._is_conflicting(memory, new_content):
                self._mark_as_outdated(memory)

        self.store_memory(
            new_content, 
            {"type": "updated", "timestamp": datetime.now().isoformat()}
        )

    def _is_conflicting(self, memory1: str, memory2: str) -> bool:
        # Conflict detection implementation
        # Can use embedding similarity or LLM
        return False

    def _mark_as_outdated(self, memory: str):
        # Mark memory as outdated
        pass

Zusammenfassung

Memory-Systeme sind eine kritische Komponente moderner KI-Agenten. The combination of short-term and long-term memory with RAG patterns enables creating truly intelligent assistants capable of learning and adaptation. When implementing, it’s crucial to consider performance optimization, data size management, and information consistency. With growing LLM capabilities and vector databases, memory systems will become even more important for creating sophisticated AI applications.

memoryai agentirag
Teilen:

CORE SYSTEMS Team

Wir bauen Kernsysteme und KI-Agenten, die den Betrieb am Laufen halten. 15 Jahre Erfahrung mit Enterprise-IT.