The Big Picture

Before diving into chapters, here's the mental model that ties the whole book together.

The three levers of response quality

For any given query, the quality of a model's answer depends on a few things you can control. The book organizes most of its advice around three levers:

The instructions: how you tell the model to behave → Prompt Engineering (Ch 5)
The context: what information the model can see → RAG & Agents (Ch 6)
The model itself: its built-in knowledge and skills → Finetuning (Ch 7) + Data (Ch 8)

Plus a "free" lever that's easy to forget:

The generation/sampling settings (temperature, top-p, etc.) → Foundation Models (Ch 2)

Start cheap, escalate slowly

The book repeatedly recommends a start-simple approach. Try the cheap levers first: prompting → context (RAG) → finetuning. Only move to a more complex/expensive lever when the simpler one stops being enough. Finetuning is powerful but should rarely be your first move.

The AI engineering workflow

text

        ┌─────────────────────────────────────────────────────┐
        │  0. Should you build it at all?  (Ch 1)              │
        └─────────────────────────────────────────────────────┘
                              │ yes
                              ▼
        ┌─────────────────────────────────────────────────────┐
        │  1. Understand the model you're using   (Ch 2)      │
        └─────────────────────────────────────────────────────┘
                              │
                              ▼
        ┌─────────────────────────────────────────────────────┐
        │  2. Build EVALUATION first   (Ch 3–4)               │
        │     (you can't improve what you can't measure)      │
        └─────────────────────────────────────────────────────┘
                              │
                              ▼
   ┌────────────── Improve quality (Ch 5–8) ──────────────────┐
   │  Prompt → Context (RAG/Agents) → Finetune → Better data  │
   └──────────────────────────────────────────────────────────┘
                              │
                              ▼
        ┌─────────────────────────────────────────────────────┐
        │  3. Optimize inference: faster & cheaper   (Ch 9)   │
        └─────────────────────────────────────────────────────┘
                              │
                              ▼
        ┌─────────────────────────────────────────────────────┐
        │  4. Assemble architecture + feedback loop  (Ch 10)  │
        └─────────────────────────────────────────────────────┘

Why evaluation comes so early

Evaluation gets two whole chapters and is called "the hardest, if not the hardest, challenge of AI engineering." The reason: foundation models are open-ended. There's often no single correct answer, outputs are in free-form text, and the same input can give different outputs each time.

If you can't measure quality reliably, every other improvement (prompting, RAG, finetuning) becomes guesswork. So the book's advice is: invest in evaluation before you invest in fancy techniques.

The four big trade-offs you'll keep meeting

Trade-off	Shows up in
Quality vs. cost vs. latency	Model choice, inference optimization
Build vs. buy (host yourself vs. use an API)	Architecture, inference
Simple vs. powerful (prompt vs. RAG vs. agent vs. finetune)	The whole improvement loop
General vs. specialized (big model vs. small finetuned model)	Finetuning, distillation

Keep these in mind, and the rest of the book reads like a series of answers to "which side of the trade-off should I pick, and why?"

Now, on to Chapter 1.

The Big Picture ​

The three levers of response quality ​

The AI engineering workflow ​

Why evaluation comes so early ​

The four big trade-offs you'll keep meeting ​

The Big Picture

The three levers of response quality

The AI engineering workflow

Why evaluation comes so early

The four big trade-offs you'll keep meeting