Chapter 10 · AI Engineering Architecture and User Feedback
In one minute
The final chapter assembles everything into a real, end-to-end system. It builds the AI application architecture step by step, starting from the simplest "just call a model" and adding pieces (context, guardrails, routing, caching, monitoring, orchestration) only as needed. The second half is product-focused: how to design a user feedback system that keeps improving your app without hurting the user experience. Feedback closes the loop that makes everything else compound.
The architecture, built up step by step
The book's philosophy: start simple, add complexity only when a real problem demands it.
Step 0, The simplest app
User → [Your App] → [Model API] → responseJust send a prompt, get a response. Ship this first. Most teams over-engineer too early.
Step 1, Add context (the model's inputs)
Enhance prompts with the right information:
- RAG retrievers, agent tools, conversation memory (Ch 6).
- This is usually the first and biggest quality upgrade.
Step 2, Add guardrails (safety on both sides)
- Input guardrails: block malicious/PII inputs, prompt injection (Ch 5).
- Output guardrails: catch unsafe, off-brand, low-quality, or malformed outputs before they reach users.
Step 3, Add a model router and gateway
- Router: send different queries to different models (cheap model for easy queries, strong model for hard ones; specialized models per task). Balances cost vs. quality.
- Gateway: a unified, secure entry point to manage models, keys, rate limits, and logging.
Step 4, Add caching
- Cache responses (and intermediate results / prompt prefixes) to cut cost and latency (ties to Ch 9).
- Exact caching and semantic caching (reuse answers for similar questions), with care about staleness and correctness.
Step 5, Add complex logic & orchestration
- Multi-step flows, agentic loops, write-actions, retries, and fallbacks.
- An orchestrator ties components together (retrievers, tools, models, guardrails) into a coherent pipeline.
Step 6, Add observability & monitoring
- Log inputs, outputs, context, latency, cost, and errors.
- Monitor quality, drift, safety, and spend in production (extends Ch 4's "evaluation in production").
- Observability is what lets you debug and improve a system you can't fully predict.
The architecture is a checklist, not a mandate
You don't need all of this on day one. Add context → guardrails → routing → caching → orchestration → monitoring in response to actual pain. Premature complexity is a common failure.
The other half: user feedback
Feedback is the fuel for continual improvement, it's how you get real-world evaluation data and training data (Ch 4 and Ch 8 depend on it).
Why feedback is special for AI
- AI outputs are open-ended, so you often can't tell quality without user signals.
- Good feedback becomes your eval set and your finetuning data: a compounding advantage (a "data flywheel").
Types of feedback
- Explicit: thumbs up/down, ratings, corrections, reports. Clear but sparse (most users won't bother).
- Implicit: behavior that reveals satisfaction: did they accept the suggestion, copy the answer, retry/rephrase, abandon, or continue the conversation? Abundant but noisy/ambiguous.
Designing a good feedback system
- Make it effortless: low-friction ways to signal quality.
- Capture implicit signals: they're where the volume is.
- Ask at the right moment: request feedback when it's natural, not annoying.
- Don't degrade UX: feedback collection must never get in the user's way.
- Watch for bias: who gives feedback isn't representative; loud complaints ≠ the average user.
- Close the loop: route feedback into evaluation, prompt/RAG improvements, and finetuning data.
Feedback can mislead
Implicit signals are biased and ambiguous (a user might copy a wrong answer). Combine multiple signals, and validate before treating feedback as ground truth.
The full loop
Build → Deploy → Collect feedback → Evaluate → Improve (prompt/RAG/finetune/data)
▲ │
└──────────────────────────────────────────────────────────────┘This is the heartbeat of AI engineering: a product that gets better the more it's used, because feedback continuously feeds evaluation and improvement.
Takeaways
- Start with the simplest architecture and add components only when a real problem appears.
- The build-up order: context → guardrails → router/gateway → cache → orchestration → monitoring.
- Routing + caching are major cost/latency levers; guardrails + monitoring keep it safe and observable.
- User feedback is the fuel for continual improvement, capture both explicit and implicit signals.
- Design feedback to be low-friction and UX-safe, watch for bias, and close the loop back into eval and data.
- Great AI products are systems with feedback loops, not one-off model calls.