The Big Picture
Before diving into chapters, here's the mental model that ties the whole book together.
The three levers of response quality
For any given query, the quality of a model's answer depends on a few things you can control. The book organizes most of its advice around three levers:
- The instructions: how you tell the model to behave → Prompt Engineering (Ch 5)
- The context: what information the model can see → RAG & Agents (Ch 6)
- The model itself: its built-in knowledge and skills → Finetuning (Ch 7) + Data (Ch 8)
Plus a "free" lever that's easy to forget:
- The generation/sampling settings (temperature, top-p, etc.) → Foundation Models (Ch 2)
Start cheap, escalate slowly
The book repeatedly recommends a start-simple approach. Try the cheap levers first: prompting → context (RAG) → finetuning. Only move to a more complex/expensive lever when the simpler one stops being enough. Finetuning is powerful but should rarely be your first move.
The AI engineering workflow
┌─────────────────────────────────────────────────────┐
│ 0. Should you build it at all? (Ch 1) │
└─────────────────────────────────────────────────────┘
│ yes
▼
┌─────────────────────────────────────────────────────┐
│ 1. Understand the model you're using (Ch 2) │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ 2. Build EVALUATION first (Ch 3–4) │
│ (you can't improve what you can't measure) │
└─────────────────────────────────────────────────────┘
│
▼
┌────────────── Improve quality (Ch 5–8) ──────────────────┐
│ Prompt → Context (RAG/Agents) → Finetune → Better data │
└──────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ 3. Optimize inference: faster & cheaper (Ch 9) │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ 4. Assemble architecture + feedback loop (Ch 10) │
└─────────────────────────────────────────────────────┘Why evaluation comes so early
Evaluation gets two whole chapters and is called "the hardest, if not the hardest, challenge of AI engineering." The reason: foundation models are open-ended. There's often no single correct answer, outputs are in free-form text, and the same input can give different outputs each time.
If you can't measure quality reliably, every other improvement (prompting, RAG, finetuning) becomes guesswork. So the book's advice is: invest in evaluation before you invest in fancy techniques.
The four big trade-offs you'll keep meeting
| Trade-off | Shows up in |
|---|---|
| Quality vs. cost vs. latency | Model choice, inference optimization |
| Build vs. buy (host yourself vs. use an API) | Architecture, inference |
| Simple vs. powerful (prompt vs. RAG vs. agent vs. finetune) | The whole improvement loop |
| General vs. specialized (big model vs. small finetuned model) | Finetuning, distillation |
Keep these in mind, and the rest of the book reads like a series of answers to "which side of the trade-off should I pick, and why?"
Now, on to Chapter 1.