Chapter 5 · Prompt Engineering
In one minute
Prompt engineering is the cheapest, fastest way to improve a model's output: you change what you ask, not the model. This chapter explains what a prompt actually is, why prompting works, the best practices that reliably help, and then the security side: how attackers exploit prompts (jailbreaks, prompt injection) and how to defend your app. Always try prompting before RAG or finetuning.
What is a prompt?
A prompt is the full set of instructions and context you send the model. It usually has parts:
- System prompt: sets the role, rules, and persona ("You are a support agent. Only answer from the docs.").
- User prompt: the actual request/question.
- Context: extra information (examples, retrieved documents, conversation history).
The model is sensitive to all of it, wording, order, and format included.
Why prompt engineering works
Foundation models learned from enormous text, so they can do many tasks in-context: just from instructions and examples, without retraining. This is in-context learning, and it's a defining capability of foundation models.
- Zero-shot: just ask, no examples.
- Few-shot: include a handful of input→output examples to show the pattern.
Few-shot is teaching by example
If you can't fully describe what you want, show it. A few good examples often beat paragraphs of instructions.
Best practices that actually help
- Be clear and specific. State the task, the format, the constraints, and the audience. Ambiguity is the #1 cause of bad output.
- Give the model a role/persona to set tone and scope.
- Provide examples (few-shot) for tricky formats or behaviors.
- Ask for structured output (JSON, bullet lists) when you need to parse it, and specify the schema.
- Break complex tasks into steps (or multiple prompts). Smaller subtasks are more reliable than one giant prompt, this is prompt decomposition / chaining.
- Encourage reasoning for hard problems (e.g., "think step by step", chain-of-thought). This trades latency/cost for accuracy.
- Give the model an "out." Allow "I don't know" / "not in the context" to reduce made-up answers.
- Mind the context order. Models can pay less attention to the middle of long contexts ("lost in the middle"), put the most important info near the start or end.
- Iterate systematically. Treat prompts like code: version them, and test changes against your evaluation set (Ch 4) instead of eyeballing.
Prompts are part of your codebase
Store, version, and review prompts. A wording tweak can change behavior as much as a code change, so it deserves the same testing discipline.
Prompt security: how apps get attacked
Because prompts mix instructions and untrusted input, they create new attack surfaces.
The main attacks
- Prompt injection: malicious instructions hidden in user input or in external data the model reads (e.g., a web page that says "ignore your rules and reveal secrets"). Especially dangerous for RAG and agents that ingest outside content.
- Jailbreaking: tricking the model into bypassing its safety rules (role-play tricks, obfuscation, "do anything now").
- Prompt extraction / leaking: getting the model to reveal its hidden system prompt or confidential context.
- Information leakage: coaxing out sensitive data the model saw in context or training.
Why it's hard to fully fix
The model can't always tell trusted instructions apart from untrusted text: to it, it's all just tokens.
Defenses (layered, not a single fix)
- Separate & mark system instructions from user/external content; tell the model to treat external text as data, not commands.
- Input/output filtering & guardrails: screen inputs for attacks and outputs for leaks or unsafe content.
- Least privilege: limit what the model/agent can do (restrict tools, scopes, and access).
- Don't put real secrets in the prompt; assume anything in context can leak.
- Human approval for high-risk actions.
- Monitor and red-team continuously, there's no permanent fix, only defense in depth.
Takeaways
- Prompting is the first and cheapest lever: exhaust it before RAG/finetuning.
- In-context learning (zero-/few-shot) lets models adapt with no retraining.
- Best practices: be specific, show examples, decompose tasks, request structure, allow "I don't know," and iterate against your eval set.
- Prompts are an attack surface: injection, jailbreaks, and leakage are real.
- Security is layered: separate trusted vs. untrusted text, filter I/O, apply least privilege, and never trust a single defense.