A year after ChatGPT, our clients are asking: “How do we get this into our systems?” Not as a chatbot on the website — anyone can do that. But as an integral part of business processes: automated contract analysis, intelligent internal knowledge base search, report generation. After six months of LLM projects, we share what works and what doesn’t.
RAG — Retrieval Augmented Generation¶
Fine-tuning is expensive and unnecessary for most enterprise use cases. RAG is more pragmatic: the user asks a question → the system finds relevant documents from the internal database → sends them to the LLM as context → the LLM generates an answer with source citations.
Our RAG stack: Azure OpenAI (GPT-4) for generation, Azure AI Search for vector search, LangChain for orchestration. Documents chunked, embedded, indexed. It works surprisingly well for knowledge bases and FAQ systems.
Prompt Engineering — More Science Than Art¶
System prompts with clear instructions, few-shot examples, chain-of-thought for complex reasoning. Guardrails: “Respond ONLY based on the provided context. If you don’t have the information, say so.” Without guardrails, LLMs happily hallucinate — and in enterprise, that’s unacceptable.
Use Case: Contract Analysis¶
A legal department at an insurance company processes hundreds of contracts monthly. The LLM extracts key clauses, identifies risks, and compares against a standard template. Result: 60% reduction in review time. The lawyer still makes the decisions — the LLM is an assistant, not a replacement.
Use Case: Internal Helpdesk¶
RAG over internal documentation (Confluence, SharePoint). An employee asks “how to request vacation” or “what’s the invoice approval process” and receives an answer with a link to the source document. 40% reduction in IT helpdesk tickets.
Security and Governance¶
Data leakage: company data must not go to the public OpenAI API. Azure OpenAI with a private endpoint — data stays in the Azure tenant.
PII filtering: before sending to the LLM, we mask personal data (names, national ID numbers, addresses). After processing, we de-mask.
Audit trail: we log every prompt and response. Who asked, what they asked, what answer they received. A necessity for regulated industries.
Content filter: Azure OpenAI has built-in content filtering. Plus custom validation — responses must not contain competitive information, financial advice, or legal conclusions without a disclaimer.
Costs and Scaling¶
GPT-4 Turbo: ~€12 per million input tokens. For 1,000 queries per day (averaging 2,000 tokens/query), that’s approximately €0.80/day. Inexpensive. But embeddings, vector DB, infrastructure — total TCO is higher. Budget €800–2,000/month for a production RAG system.
What Doesn’t Work (Yet)¶
Accuracy for critical decisions: LLMs hallucinate. For systems where an error = financial loss, you need human-in-the-loop. Structured output: JSON extraction from unstructured text is still unreliable (function calling helps, but not 100%).
LLM Is Infrastructure, Not a Product¶
Don’t dismiss it as hype, but don’t think a ChatGPT wrapper is an enterprise solution. RAG, guardrails, monitoring, security — that’s what turns an LLM demo into a production system. And that difference is 80% of the work.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us