AI Testing — How to Test Non-Deterministic Software

assert response == expected — doesn’t work with LLMs. The answer is different every time. We need a new testing paradigm.

New Approaches¶

Property-based testing: Test properties, not exact output. Metamorphic testing: A small change in input must not change the facts. LLM-as-judge: GPT-4 evaluates based on a rubric.

Evaluation Pipeline¶

Golden dataset: 100+ pairs
Automatic run on every PR
Metrics: faithfulness, relevance, toxicity
Regression detection: alert on >5% drop

Red Teaming¶

Automated adversarial testing: prompt injection, jailbreak, PII leakage. In CI, not as a one-off.

AI Testing Is Software Testing 2.0¶

Property-based tests + LLM-as-judge + evaluation pipeline = production-ready.

ai testingqualitytestingautomation

CORE SYSTEMS

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

AI Testing — How to Test Non-Deterministic Software

New Approaches¶

Evaluation Pipeline¶

Red Teaming¶

AI Testing Is Software Testing 2.0¶

CORE SYSTEMS

Need help with implementation?

Related articles

LLM Evaluation — How to Measure the Quality of Text-Generating AI

Unit Testing with JUnit and Mockito

Automated UI Testing with Selenium WebDriver