Applying 32 GenAI design patterns to a real project

Posted on Jun 2, 2026

I have been working in managed cloud hosting for about seven years. In that time, I have built a few production AI systems that handle real customer conversations and diagnose server issues. One of them handles about 30% of incoming conversations on its own. Another cut investigation time for complex server issues from 30 minutes down to about 8 minutes.

When I picked up Lakshmanan and Hapke’s Generative AI Design Patterns (O’Reilly, 2025), I wanted to see how many of their 32 patterns mapped to what I want to build next: a context-aware AI hosting support agent that can answer informational questions, debug complex server issues, and suggest application code optimizations for speed and security. 28 of the 32 patterns apply. The other four don’t, either because they require self-hosted models, add cost without proportional benefit, or belong to a separate system that is out of scope for what we are building here.

This is a 10-part series. Every post takes one piece of the system, picks the patterns that power it, and shows the code. The implementation uses Google ADK 2.0 and runs against a real hosting platform with real servers and real consequences when the AI gets a diagnosis wrong.

The architecture

The system follows a hub-and-spoke pattern. A Support Router workflow loads customer context, classifies the request, and routes it to the right specialist agent. Each agent owns its own tools, instructions, and domain. Memory is not an agent. It is infrastructure that loads context before routing and persists learnings after the agent responds.

   ┌──────────────────────────────────────────────────────────────────┐
   │                   Support Router (Workflow)                      │
   │                                                                  │
   │  load_context ──▶ classify ──▶ route ──▶ ... ──▶ save_context    │
   └───────────────────────────┬──────────────────────────────────────┘
                               │
         ┌───────────┬─────────┼─────────┬───────────┐
         │           │         │         │           │
         ▼           ▼         ▼         ▼           ▼
   ┌──────────┐ ┌─────────┐ ┌─────┐ ┌────────┐ ┌──────────┐
   │Knowledge │ │Diagnostc│ │ Ops │ │Billing │ │Escalation│
   │  Agent   │ │  Agent  │ │Agent│ │ Agent  │ │  Agent   │
   └────┬─────┘ └────┬────┘ └──┬──┘ └───┬────┘ └────┬─────┘
        │            │         │        │           │
        ▼            ▼         ▼        ▼           ▼
   ┌────────┐  ┌─────────┐ ┌───────┐ ┌───────┐ ┌─────────┐
   │  KB    │  │SSH Diag │ │  CW   │ │Billing│ │Intercom │
   │  RAG   │  │  Tool   │ │  API  │ │  API  │ │  API    │
   └────────┘  └─────────┘ └───────┘ └───────┘ └─────────┘
   ┌──────────────────────────────────────────────────────────────────┐
   │                    Shared Infrastructure                         │
   │  Customer Context │ Conv History │ Change Log │ Billing/CRM      │
   └──────────────────────────────────────────────────────────────────┘

The classifier re-runs on every new message. A customer can start with a knowledge question, follow up with a diagnostics request, and then ask about their billing, all in the same conversation. Each message gets reclassified and routed to the right specialist. Session state carries forward, so every agent knows what was already discussed. The customer sees one continuous conversation. Agent boundaries are invisible.

When the AI cannot resolve something, the Escalation Agent creates an Intercom conversation with the full transcript attached, so the human agent picks up exactly where the AI left off. The customer never repeats themselves.

The scaffold

Here is the working scaffold. It compiles, runs, and grows across the series. Tool implementations are stubbed because each one gets its own post.

from google.adk import Agent, Workflow, Event, Context
from google.adk.workflow import node, RetryConfig
from pydantic import BaseModel
from typing import Literal


# --- Context loading and persistence ---

@node(retry_config=RetryConfig(max_attempts=2))
async def load_context(ctx: Context, node_input: str):
    """Pull customer profile and recent history into state."""
    customer_id = ctx.state.get("customer_id")
    if customer_id:
        ctx.state["customer_profile"] = await fetch_customer_profile(customer_id)
        ctx.state["customer_history"] = await fetch_recent_conversations(customer_id)
    return node_input


async def save_context(ctx: Context, node_input: str):
    """Persist what was learned after the agent responds."""
    customer_id = ctx.state.get("customer_id")
    diagnosis = ctx.state.get("diagnosis")
    if customer_id and diagnosis:
        await persist_interaction(
            customer_id=customer_id,
            issue=diagnosis,
            resolution=node_input,
        )
    return node_input


# --- Tools (specifics come in later posts) ---

def search_knowledge_base(query: str):
    """Search the hosting knowledge base for relevant articles."""
    ...

def check_server_status(server_id: str):
    """CPU, memory, disk, and service health for a server."""
    ...

def get_server_logs(server_id: str, service: str, lines: int = 100):
    """Recent log entries for a service."""
    ...

def get_invoices(customer_id: str):
    """Recent invoices and payment status."""
    ...

def restart_service(server_id: str, service: str):
    """Restart a service. Confirmation gated by a callback."""
    ...

async def create_intercom_conversation(email: str, transcript: str):
    """Hand off to a human via Intercom with the full transcript attached."""
    ...


# --- Specialist agents ---

knowledge_agent = Agent(
    name="knowledge_agent",
    model="gemini-2.5-flash",
    instruction=(
        "Answer the hosting question using the knowledge base.\n"
        "Customer: {customer_profile}\n"
        "Always cite the specific KB article. "
        "If you cannot find the answer, say so."
    ),
    tools=[search_knowledge_base],
)

diagnostics_agent = Agent(
    name="diagnostics_agent",
    model="gemini-2.5-pro",
    instruction=(
        "Diagnose this server issue step by step.\n"
        "Customer: {customer_profile}\n"
        "Check server status first, then relevant logs, then correlate "
        "findings before suggesting a fix."
    ),
    tools=[check_server_status, get_server_logs],
)

ops_agent = Agent(
    name="ops_agent",
    model="gemini-2.5-flash",
    instruction=(
        "Execute approved server operations.\n"
        "Customer: {customer_profile}\n"
        "Never run destructive operations without confirmation."
    ),
    tools=[restart_service],
)

billing_agent = Agent(
    name="billing_agent",
    model="gemini-2.5-flash",
    instruction=(
        "Handle billing and plan questions.\n"
        "Customer: {customer_profile}\n"
        "Look up invoices and plan details. "
        "Never process refunds or plan changes. Escalate those."
    ),
    tools=[get_invoices],
)

escalation_agent = Agent(
    name="escalation_agent",
    model="gemini-2.5-flash",
    instruction=(
        "Hand the conversation off to a human support engineer.\n"
        "Customer: {customer_profile}\n"
        "Create an Intercom conversation and attach the full transcript."
    ),
    tools=[create_intercom_conversation],
)


# --- Intent classifier and router ---

class Intent(BaseModel):
    intent: Literal["KNOWLEDGE", "DIAGNOSTICS", "OPS", "BILLING", "ESCALATION"]
    urgency: Literal["LOW", "MEDIUM", "HIGH"]


classify = Agent(
    name="classify",
    model="gemini-2.5-flash",
    instruction=(
        "Classify this hosting support request.\n"
        "Customer: {customer_profile}\n"
        "KNOWLEDGE: how-to and configuration questions.\n"
        "DIAGNOSTICS: performance issues, errors, downtime.\n"
        "OPS: restart services, clear cache, apply approved changes.\n"
        "BILLING: invoices, plans, payments.\n"
        "ESCALATION: customer asks for a human."
    ),
    output_schema=Intent,
    output_key="classification",
)


def route(classification: Intent):
    yield Event(route=classification.intent)


# --- Workflow ---

root_agent = Workflow(
    name="support_router",
    edges=[
        ("START", load_context, classify, route),
        (route, {
            "KNOWLEDGE": knowledge_agent,
            "DIAGNOSTICS": diagnostics_agent,
            "OPS": ops_agent,
            "BILLING": billing_agent,
            "ESCALATION": escalation_agent,
        }),
        (knowledge_agent, save_context),
        (diagnostics_agent, save_context),
        (ops_agent, save_context),
        (billing_agent, save_context),
        (escalation_agent, save_context),
    ],
)

A few things worth pointing out:

load_context and save_context are plain Python nodes, not agents. They run on every conversation. Loading happens before the classifier so every downstream agent has the customer profile in state, available via the {customer_profile} placeholder in instructions. Saving happens after the agent responds, so the next conversation starts with the previous one’s outcome already persisted.

classify uses output_schema=Intent to force a typed response. output_key="classification" writes that response into session state under the name classification, which the route function picks up by parameter name. route then yields an Event(route=...), and the workflow uses the dict in the next edge to pick the right specialist.

The @node(retry_config=...) decorator handles the case where the customer profile API is briefly unavailable. ADK 2.0 catches the exception, waits, and retries automatically. Each agent can have its own retry policy, which becomes important once you have five of them and a few external APIs in the mix.

All 32 patterns and where they fit

#	Pattern	Used?	Where it fits
P1	Logits Masking	No	Requires self-hosted models.
P2	Grammar	Yes	Structured JSON for intent classification, diagnostic reports, escalation payloads.
P3	Style Transfer	Yes	Formal for enterprise customers, conversational for small business owners.
P4	Reverse Neutralization	Yes	One diagnostic finding, restyled for the customer vs. for the human handoff note.
P5	Content Optimization	Yes	Preference tuning to figure out which response templates actually resolve conversations faster.
P6	Basic RAG	Yes	Grounding answers in KB articles, platform docs, and resolved conversations.
P7	Semantic Indexing	Yes	“My WordPress keeps crashing” should match PHP memory and OPcache articles, not anything containing “crash”.
P8	Indexing at Scale	Yes	KB articles get updated, new platform features launch, deprecated procedures need to be removed.
P9	Index-Aware Retrieval	Yes	HyDE for vague queries, GraphRAG for connecting related issues.
P10	Node Postprocessing	Yes	Reranking, deduplication, filtering by stack before the LLM sees any context.
P11	Trustworthy Generation	Yes	Citations to specific KB articles, plus confidence detection.
P12	Deep Search	Yes	Multi-hop retrieval when the root cause is three levels deep.
P13	Chain of Thought	Yes	Step-by-step diagnostics. Check status, examine logs, correlate metrics, narrow down.
P14	Tree of Thoughts	No	Chain of Thought covers the main case. ToT adds cost without proportional benefit.
P15	Adapter Tuning	Yes	Fine-tuning for hosting domain vocabulary.
P16	Evol-Instruct	Yes	Generates training data from resolved conversations without manual labeling.
P17	LLM-as-Judge	No	Belongs to a separate QA system that runs after conversations close. Out of scope here.
P18	Reflection	Yes	Draft, critique, revise for responses where wrong technical details can cause data loss.
P19	Dependency Injection	Yes	Swap models and mock server APIs so you can test without touching real infrastructure.
P20	Prompt Optimization	Yes	Systematic updates when models change or new features need updated instructions.
P21	Tool Calling	Yes	Cloudways API and an SSH diagnostic tool for server management, deployment, DNS, SSL, backups.
P22	Code Execution	Yes	Diagnostic scripts. Parse log files, analyze slow query logs, check connection counts.
P23	Multiagent Collaboration	Yes	The Support Router itself. Specialists for knowledge, diagnostics, ops, billing, escalation.
P24	Small Language Model	Yes	Cheap, fast models for intent classification and routing.
P25	Prompt Caching	Yes	The same 50 questions about DNS, SSL, and PHP versions account for a large chunk of traffic.
P26	Inference Optimization	No	Continuous batching and speculative decoding only matter if you self-host.
P27	Degradation Testing	Yes	Measures response quality under load, not just whether the system stays up.
P28	Long-Term Memory	Yes	Remembers customer infrastructure across sessions. Implemented as `load_context` and `save_context`.
P29	Template Generation	Yes	Structured diagnostic reports the LLM fills in but cannot fabricate.
P30	Assembled Reformat	Yes	Pull server facts from the API. Let the LLM only handle explanation and formatting.
P31	Self-Check	Yes	Log-probabilities to flag uncertain diagnostic claims. Surface the uncertainty instead of guessing.
P32	Guardrails	Yes	Blocks destructive operations, validates server IDs, prevents the agent from doing things it shouldn’t.

The four skips are P1, P14, P17, and P26. Two of them require self-hosting. One adds cost without proportional benefit. One belongs to a different system.

What’s coming

Each post in the series picks one agent and the patterns that power it:

Post	Focus	Patterns
1	Overview (this post)	All 32 mapped
2	Knowledge Agent and RAG	Basic RAG, Semantic Indexing, Node Postprocessing, Trustworthy Generation
3	Response quality and templates	Style Transfer, Template Generation
4	Safety for server operations	Self-Check, Guardrails
5	Diagnostic reasoning	Chain of Thought, Reflection
6	MCP tools and script execution	Tool Calling, Code Execution
7	Multi-agent coordination	Multiagent Collaboration
8	Customer context memory	Long-Term Memory
9	Testing and reliability	Dependency Injection, Degradation Testing
10	Full architecture	All 28 patterns composed

Every post has working code, the tradeoffs I actually ran into building this against live hosting infrastructure, and the places where things didn’t work the first time.

Thanks for reading.