Applying 32 GenAI design patterns to a real project
I have been working in managed cloud hosting for about seven years. In that time, I have built a few production AI systems that handle real customer conversations and diagnose server issues. One of them handles about 30% of incoming conversations on its own. Another cut investigation time for complex server issues from 30 minutes down to about 8 minutes.
When I picked up Lakshmanan and Hapke’s Generative AI Design Patterns (O’Reilly, 2025), I wanted to see how many of their 32 patterns mapped to what I want to build next: a context-aware AI hosting support agent that can answer informational questions, debug complex server issues, and suggest application code optimizations for speed and security. 28 of the 32 patterns apply. The other four don’t, either because they require self-hosted models, add cost without proportional benefit, or belong to a separate system that is out of scope for what we are building here.
This is a 10-part series. Every post takes one piece of the system, picks the patterns that power it, and shows the code. The implementation uses Google ADK 2.0 and runs against a real hosting platform with real servers and real consequences when the AI gets a diagnosis wrong.
The architecture
The system follows a hub-and-spoke pattern. A Support Router workflow loads customer context, classifies the request, and routes it to the right specialist agent. Each agent owns its own tools, instructions, and domain. Memory is not an agent. It is infrastructure that loads context before routing and persists learnings after the agent responds.
┌──────────────────────────────────────────────────────────────────┐
│ Support Router (Workflow) │
│ │
│ load_context ──▶ classify ──▶ route ──▶ ... ──▶ save_context │
└───────────────────────────┬──────────────────────────────────────┘
│
┌───────────┬─────────┼─────────┬───────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌──────────┐ ┌─────────┐ ┌─────┐ ┌────────┐ ┌──────────┐
│Knowledge │ │Diagnostc│ │ Ops │ │Billing │ │Escalation│
│ Agent │ │ Agent │ │Agent│ │ Agent │ │ Agent │
└────┬─────┘ └────┬────┘ └──┬──┘ └───┬────┘ └────┬─────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌────────┐ ┌─────────┐ ┌───────┐ ┌───────┐ ┌─────────┐
│ KB │ │SSH Diag │ │ CW │ │Billing│ │Intercom │
│ RAG │ │ Tool │ │ API │ │ API │ │ API │
└────────┘ └─────────┘ └───────┘ └───────┘ └─────────┘
┌──────────────────────────────────────────────────────────────────┐
│ Shared Infrastructure │
│ Customer Context │ Conv History │ Change Log │ Billing/CRM │
└──────────────────────────────────────────────────────────────────┘
The classifier re-runs on every new message. A customer can start with a knowledge question, follow up with a diagnostics request, and then ask about their billing, all in the same conversation. Each message gets reclassified and routed to the right specialist. Session state carries forward, so every agent knows what was already discussed. The customer sees one continuous conversation. Agent boundaries are invisible.
When the AI cannot resolve something, the Escalation Agent creates an Intercom conversation with the full transcript attached, so the human agent picks up exactly where the AI left off. The customer never repeats themselves.
The scaffold
Here is the working scaffold. It compiles, runs, and grows across the series. Tool implementations are stubbed because each one gets its own post.
from google.adk import Agent, Workflow, Event, Context
from google.adk.workflow import node, RetryConfig
from pydantic import BaseModel
from typing import Literal
# --- Context loading and persistence ---
@node(retry_config=RetryConfig(max_attempts=2))
async def load_context(ctx: Context, node_input: str):
"""Pull customer profile and recent history into state."""
customer_id = ctx.state.get("customer_id")
if customer_id:
ctx.state["customer_profile"] = await fetch_customer_profile(customer_id)
ctx.state["customer_history"] = await fetch_recent_conversations(customer_id)
return node_input
async def save_context(ctx: Context, node_input: str):
"""Persist what was learned after the agent responds."""
customer_id = ctx.state.get("customer_id")
diagnosis = ctx.state.get("diagnosis")
if customer_id and diagnosis:
await persist_interaction(
customer_id=customer_id,
issue=diagnosis,
resolution=node_input,
)
return node_input
# --- Tools (specifics come in later posts) ---
def search_knowledge_base(query: str):
"""Search the hosting knowledge base for relevant articles."""
...
def check_server_status(server_id: str):
"""CPU, memory, disk, and service health for a server."""
...
def get_server_logs(server_id: str, service: str, lines: int = 100):
"""Recent log entries for a service."""
...
def get_invoices(customer_id: str):
"""Recent invoices and payment status."""
...
def restart_service(server_id: str, service: str):
"""Restart a service. Confirmation gated by a callback."""
...
async def create_intercom_conversation(email: str, transcript: str):
"""Hand off to a human via Intercom with the full transcript attached."""
...
# --- Specialist agents ---
knowledge_agent = Agent(
name="knowledge_agent",
model="gemini-2.5-flash",
instruction=(
"Answer the hosting question using the knowledge base.\n"
"Customer: {customer_profile}\n"
"Always cite the specific KB article. "
"If you cannot find the answer, say so."
),
tools=[search_knowledge_base],
)
diagnostics_agent = Agent(
name="diagnostics_agent",
model="gemini-2.5-pro",
instruction=(
"Diagnose this server issue step by step.\n"
"Customer: {customer_profile}\n"
"Check server status first, then relevant logs, then correlate "
"findings before suggesting a fix."
),
tools=[check_server_status, get_server_logs],
)
ops_agent = Agent(
name="ops_agent",
model="gemini-2.5-flash",
instruction=(
"Execute approved server operations.\n"
"Customer: {customer_profile}\n"
"Never run destructive operations without confirmation."
),
tools=[restart_service],
)
billing_agent = Agent(
name="billing_agent",
model="gemini-2.5-flash",
instruction=(
"Handle billing and plan questions.\n"
"Customer: {customer_profile}\n"
"Look up invoices and plan details. "
"Never process refunds or plan changes. Escalate those."
),
tools=[get_invoices],
)
escalation_agent = Agent(
name="escalation_agent",
model="gemini-2.5-flash",
instruction=(
"Hand the conversation off to a human support engineer.\n"
"Customer: {customer_profile}\n"
"Create an Intercom conversation and attach the full transcript."
),
tools=[create_intercom_conversation],
)
# --- Intent classifier and router ---
class Intent(BaseModel):
intent: Literal["KNOWLEDGE", "DIAGNOSTICS", "OPS", "BILLING", "ESCALATION"]
urgency: Literal["LOW", "MEDIUM", "HIGH"]
classify = Agent(
name="classify",
model="gemini-2.5-flash",
instruction=(
"Classify this hosting support request.\n"
"Customer: {customer_profile}\n"
"KNOWLEDGE: how-to and configuration questions.\n"
"DIAGNOSTICS: performance issues, errors, downtime.\n"
"OPS: restart services, clear cache, apply approved changes.\n"
"BILLING: invoices, plans, payments.\n"
"ESCALATION: customer asks for a human."
),
output_schema=Intent,
output_key="classification",
)
def route(classification: Intent):
yield Event(route=classification.intent)
# --- Workflow ---
root_agent = Workflow(
name="support_router",
edges=[
("START", load_context, classify, route),
(route, {
"KNOWLEDGE": knowledge_agent,
"DIAGNOSTICS": diagnostics_agent,
"OPS": ops_agent,
"BILLING": billing_agent,
"ESCALATION": escalation_agent,
}),
(knowledge_agent, save_context),
(diagnostics_agent, save_context),
(ops_agent, save_context),
(billing_agent, save_context),
(escalation_agent, save_context),
],
)
A few things worth pointing out:
load_context and save_context are plain Python nodes, not agents. They run on every conversation. Loading happens before the classifier so every downstream agent has the customer profile in state, available via the {customer_profile} placeholder in instructions. Saving happens after the agent responds, so the next conversation starts with the previous one’s outcome already persisted.
classify uses output_schema=Intent to force a typed response. output_key="classification" writes that response into session state under the name classification, which the route function picks up by parameter name. route then yields an Event(route=...), and the workflow uses the dict in the next edge to pick the right specialist.
The @node(retry_config=...) decorator handles the case where the customer profile API is briefly unavailable. ADK 2.0 catches the exception, waits, and retries automatically. Each agent can have its own retry policy, which becomes important once you have five of them and a few external APIs in the mix.
All 32 patterns and where they fit
| # | Pattern | Used? | Where it fits |
|---|---|---|---|
| P1 | Logits Masking | No | Requires self-hosted models. |
| P2 | Grammar | Yes | Structured JSON for intent classification, diagnostic reports, escalation payloads. |
| P3 | Style Transfer | Yes | Formal for enterprise customers, conversational for small business owners. |
| P4 | Reverse Neutralization | Yes | One diagnostic finding, restyled for the customer vs. for the human handoff note. |
| P5 | Content Optimization | Yes | Preference tuning to figure out which response templates actually resolve conversations faster. |
| P6 | Basic RAG | Yes | Grounding answers in KB articles, platform docs, and resolved conversations. |
| P7 | Semantic Indexing | Yes | “My WordPress keeps crashing” should match PHP memory and OPcache articles, not anything containing “crash”. |
| P8 | Indexing at Scale | Yes | KB articles get updated, new platform features launch, deprecated procedures need to be removed. |
| P9 | Index-Aware Retrieval | Yes | HyDE for vague queries, GraphRAG for connecting related issues. |
| P10 | Node Postprocessing | Yes | Reranking, deduplication, filtering by stack before the LLM sees any context. |
| P11 | Trustworthy Generation | Yes | Citations to specific KB articles, plus confidence detection. |
| P12 | Deep Search | Yes | Multi-hop retrieval when the root cause is three levels deep. |
| P13 | Chain of Thought | Yes | Step-by-step diagnostics. Check status, examine logs, correlate metrics, narrow down. |
| P14 | Tree of Thoughts | No | Chain of Thought covers the main case. ToT adds cost without proportional benefit. |
| P15 | Adapter Tuning | Yes | Fine-tuning for hosting domain vocabulary. |
| P16 | Evol-Instruct | Yes | Generates training data from resolved conversations without manual labeling. |
| P17 | LLM-as-Judge | No | Belongs to a separate QA system that runs after conversations close. Out of scope here. |
| P18 | Reflection | Yes | Draft, critique, revise for responses where wrong technical details can cause data loss. |
| P19 | Dependency Injection | Yes | Swap models and mock server APIs so you can test without touching real infrastructure. |
| P20 | Prompt Optimization | Yes | Systematic updates when models change or new features need updated instructions. |
| P21 | Tool Calling | Yes | Cloudways API and an SSH diagnostic tool for server management, deployment, DNS, SSL, backups. |
| P22 | Code Execution | Yes | Diagnostic scripts. Parse log files, analyze slow query logs, check connection counts. |
| P23 | Multiagent Collaboration | Yes | The Support Router itself. Specialists for knowledge, diagnostics, ops, billing, escalation. |
| P24 | Small Language Model | Yes | Cheap, fast models for intent classification and routing. |
| P25 | Prompt Caching | Yes | The same 50 questions about DNS, SSL, and PHP versions account for a large chunk of traffic. |
| P26 | Inference Optimization | No | Continuous batching and speculative decoding only matter if you self-host. |
| P27 | Degradation Testing | Yes | Measures response quality under load, not just whether the system stays up. |
| P28 | Long-Term Memory | Yes | Remembers customer infrastructure across sessions. Implemented as load_context and save_context. |
| P29 | Template Generation | Yes | Structured diagnostic reports the LLM fills in but cannot fabricate. |
| P30 | Assembled Reformat | Yes | Pull server facts from the API. Let the LLM only handle explanation and formatting. |
| P31 | Self-Check | Yes | Log-probabilities to flag uncertain diagnostic claims. Surface the uncertainty instead of guessing. |
| P32 | Guardrails | Yes | Blocks destructive operations, validates server IDs, prevents the agent from doing things it shouldn’t. |
The four skips are P1, P14, P17, and P26. Two of them require self-hosting. One adds cost without proportional benefit. One belongs to a different system.
What’s coming
Each post in the series picks one agent and the patterns that power it:
| Post | Focus | Patterns |
|---|---|---|
| 1 | Overview (this post) | All 32 mapped |
| 2 | Knowledge Agent and RAG | Basic RAG, Semantic Indexing, Node Postprocessing, Trustworthy Generation |
| 3 | Response quality and templates | Style Transfer, Template Generation |
| 4 | Safety for server operations | Self-Check, Guardrails |
| 5 | Diagnostic reasoning | Chain of Thought, Reflection |
| 6 | MCP tools and script execution | Tool Calling, Code Execution |
| 7 | Multi-agent coordination | Multiagent Collaboration |
| 8 | Customer context memory | Long-Term Memory |
| 9 | Testing and reliability | Dependency Injection, Degradation Testing |
| 10 | Full architecture | All 28 patterns composed |
Every post has working code, the tradeoffs I actually ran into building this against live hosting infrastructure, and the places where things didn’t work the first time.
Thanks for reading.