Cheat Sheet
A one-page refresher of the whole book. Use it to jog your memory or decide your next move.
The mental model
Improve a model's answer using four levers, cheapest first:
- Sampling settings (temperature, top-p), free (Ch 2)
- Prompt: instructions & examples (Ch 5)
- Context: RAG & agents (Ch 6)
- The model: finetuning & data (Ch 7–8)
Golden rule: start simple, escalate only when the simpler lever runs out. Prompt → RAG → Finetune.
Decision guide
| If you need to… | Use | Chapter |
|---|---|---|
| Change tone/format/behavior quickly | Prompting | 5 |
| Give the model facts/knowledge it lacks | RAG | 6 |
| Have the model do things (multi-step, tools) | Agents | 6 |
| Teach a durable skill or shrink to a specialist | Finetuning (LoRA/QLoRA) | 7 |
| Fix "answers are wrong/made up" | Grounding (RAG) + lower temp + eval | 2,3,6 |
| Make it cheaper/faster | Quantization, distillation, batching, caching | 9 |
| Know if any change actually helped | Evaluation pipeline | 3,4 |
RAG vs. Finetuning (the classic question)
- RAG = give the model knowledge → use when info is missing, changing, or private.
- Finetuning = teach the model a skill/behavior → use for format, tone, style, or specialization.
- They're complementary. RAG is usually cheaper and easier, try it first.
Evaluation checklist
- [ ] Defined measurable criteria tied to user value
- [ ] Built a versioned eval set from real examples (incl. edge cases)
- [ ] Chose methods per criterion (exact / lexical / semantic / AI judge / functional)
- [ ] Controlled AI-judge bias (pairwise, rubrics, pinned model)
- [ ] Automated the pipeline; run on every change
- [ ] Decontaminated eval data from training/prompt data
- [ ] Monitoring + A/B tests in production
Prompt engineering checklist
- [ ] Clear, specific task + constraints + audience
- [ ] Role/persona set in system prompt
- [ ] Few-shot examples for tricky formats
- [ ] Ask for structured output (with schema)
- [ ] Decompose complex tasks; chain prompts
- [ ] Allow "I don't know" to cut hallucinations
- [ ] Important info at start/end (avoid lost-in-the-middle)
- [ ] Version prompts; test against eval set
Security must-dos
- Separate trusted instructions from untrusted input/external data
- Filter inputs (injection/PII) and outputs (leaks/unsafe)
- Least privilege for agents/tools; sandbox code; approve risky actions
- Never put real secrets in the prompt; assume context can leak
- Red-team and monitor continuously
Inference optimization toolbox
- Model level: quantization (best ROI), distillation, pruning, MoE
- Service level: continuous batching, KV/prefix cache, parallelism, speculative decoding
- API vs. self-host: API optimizes for you; self-hosting = control + scale economics, but your responsibility
Production architecture build order
Simple call → + Context (RAG/agents) → + Guardrails → + Router/Gateway → + Cache → + Orchestration → + Monitoring
…plus a user feedback loop (explicit + implicit signals) that feeds evaluation and finetuning data.
The 10 chapters in one line each
- Building AI Apps: scale created foundation models & a new discipline; ask "should I build this?"
- Foundation Models: data + architecture/scale + post-training shape behavior; sampling explains quirks.
- Evaluation Methodology: perplexity, similarity, and AI-as-a-judge; evaluation is the hardest part.
- Evaluating AI Systems: define criteria, distrust benchmarks, build your own eval pipeline.
- Prompt Engineering: cheapest lever; best practices + prompt security.
- RAG and Agents: give the model context (retrieval) and actions (tools).
- Finetuning: change the model itself; LoRA/QLoRA make it affordable.
- Dataset Engineering: data is the bottleneck; acquire, synthesize, clean, and measure quality.
- Inference Optimization: make it faster/cheaper at model and service levels.
- Architecture & Feedback: assemble the system; close the feedback loop to keep improving.