I read through these 12 mini-articles (takes less than 15 minutes to read them all) and found lots of advice that aligns with my own direct learnings in LLMs and in building/deploying LLM-based products live in production.
Audience: any stakeholder making strategic decisions about how/where to use LLMs within your product; any delivery manager with influence over implementation decisions in LLM-based product/engineering; any senior/lead engineer making implementation decisions around LLMs.
Resource
https://github.com/humanlayer/12-factor-agents
Summary
Key information I extracted from the 12 documents, using ChatGPT-4-o1 and a few rounds of prompting:
- LLMs are non-deterministic, so the only way to ensure quality and stability is through evaluation, observability, and tight control over prompt and logic changes.
- Prompt engineering is not optional or secondary—it’s where much of the actual ‘programming’ happens in LLM-based systems.
- Human oversight is essential because LLMs are confident but imperfect; graceful failure modes and fallback strategies increase trust.
- The narrower the scope of an agent, the better it can perform—generality breeds ambiguity, hallucinations, and brittleness.
- Treating agents like standalone products forces you to think about UX, feedback loops, and lifecycle management—essential for real-world use.
- Agents shouldn’t pretend to be human—they’re most effective when they transparently act as power tools that users can guide and trust.
- Fast iteration with real users beats speculative perfection; field usage reveals issues you’ll never catch in the lab.
- Hardcoding model assumptions, prompt strings, or eval criteria kills agility—decoupling these allows you to adapt quickly to better models and shifting requirements.