Chapter 10 · AI Engineering Architecture and User Feedback

In one minute

The final chapter assembles everything into a real, end-to-end system. It builds the AI application architecture step by step, starting from the simplest "just call a model" and adding pieces (context, guardrails, routing, caching, monitoring, orchestration) only as needed. The second half is product-focused: how to design a user feedback system that keeps improving your app without hurting the user experience. Feedback closes the loop that makes everything else compound.

The architecture, built up step by step

The book's philosophy: start simple, add complexity only when a real problem demands it.

Step 0, The simplest app

text

User → [Your App] → [Model API] → response

Just send a prompt, get a response. Ship this first. Most teams over-engineer too early.

Step 1, Add context (the model's inputs)

Enhance prompts with the right information:

RAG retrievers, agent tools, conversation memory (Ch 6).
This is usually the first and biggest quality upgrade.

Step 2, Add guardrails (safety on both sides)

Input guardrails: block malicious/PII inputs, prompt injection (Ch 5).
Output guardrails: catch unsafe, off-brand, low-quality, or malformed outputs before they reach users.

Step 3, Add a model router and gateway

Router: send different queries to different models (cheap model for easy queries, strong model for hard ones; specialized models per task). Balances cost vs. quality.
Gateway: a unified, secure entry point to manage models, keys, rate limits, and logging.

Step 4, Add caching

Cache responses (and intermediate results / prompt prefixes) to cut cost and latency (ties to Ch 9).
Exact caching and semantic caching (reuse answers for similar questions), with care about staleness and correctness.

Step 5, Add complex logic & orchestration

Multi-step flows, agentic loops, write-actions, retries, and fallbacks.
An orchestrator ties components together (retrievers, tools, models, guardrails) into a coherent pipeline.

Step 6, Add observability & monitoring

Log inputs, outputs, context, latency, cost, and errors.
Monitor quality, drift, safety, and spend in production (extends Ch 4's "evaluation in production").
Observability is what lets you debug and improve a system you can't fully predict.

The architecture is a checklist, not a mandate

You don't need all of this on day one. Add context → guardrails → routing → caching → orchestration → monitoring in response to actual pain. Premature complexity is a common failure.

The other half: user feedback

Feedback is the fuel for continual improvement, it's how you get real-world evaluation data and training data (Ch 4 and Ch 8 depend on it).

Why feedback is special for AI

AI outputs are open-ended, so you often can't tell quality without user signals.
Good feedback becomes your eval set and your finetuning data: a compounding advantage (a "data flywheel").

Types of feedback

Explicit: thumbs up/down, ratings, corrections, reports. Clear but sparse (most users won't bother).
Implicit: behavior that reveals satisfaction: did they accept the suggestion, copy the answer, retry/rephrase, abandon, or continue the conversation? Abundant but noisy/ambiguous.

Designing a good feedback system

Make it effortless: low-friction ways to signal quality.
Capture implicit signals: they're where the volume is.
Ask at the right moment: request feedback when it's natural, not annoying.
Don't degrade UX: feedback collection must never get in the user's way.
Watch for bias: who gives feedback isn't representative; loud complaints ≠ the average user.
Close the loop: route feedback into evaluation, prompt/RAG improvements, and finetuning data.

Feedback can mislead

Implicit signals are biased and ambiguous (a user might copy a wrong answer). Combine multiple signals, and validate before treating feedback as ground truth.

The full loop

text

   Build → Deploy → Collect feedback → Evaluate → Improve (prompt/RAG/finetune/data)
     ▲                                                              │
     └──────────────────────────────────────────────────────────────┘

This is the heartbeat of AI engineering: a product that gets better the more it's used, because feedback continuously feeds evaluation and improvement.

Takeaways

Start with the simplest architecture and add components only when a real problem appears.
The build-up order: context → guardrails → router/gateway → cache → orchestration → monitoring.
Routing + caching are major cost/latency levers; guardrails + monitoring keep it safe and observable.
User feedback is the fuel for continual improvement, capture both explicit and implicit signals.
Design feedback to be low-friction and UX-safe, watch for bias, and close the loop back into eval and data.
Great AI products are systems with feedback loops, not one-off model calls.

Chapter 10 · AI Engineering Architecture and User Feedback ​

In one minute ​

The architecture, built up step by step ​

Step 0, The simplest app ​

Step 1, Add context (the model's inputs) ​

Step 2, Add guardrails (safety on both sides) ​

Step 3, Add a model router and gateway ​

Step 4, Add caching ​

Step 5, Add complex logic & orchestration ​

Step 6, Add observability & monitoring ​

The other half: user feedback ​

Why feedback is special for AI ​

Types of feedback ​

Designing a good feedback system ​

The full loop ​

Takeaways ​