Applying 32 GenAI design patterns to a real project

Posted on Jun 2, 2026

I have been working in managed cloud hosting for about seven years. In that time, I have built a few production AI systems that handle real customer conversations and diagnose server issues. One of them handles about 30% of incoming conversations on its own. Another cut investigation time for complex server issues from 30 minutes down to about 8 minutes.

When I picked up Lakshmanan and Hapke’s Generative AI Design Patterns (O’Reilly, 2025), I wanted to see how many of their 32 patterns mapped to what I want to build next: a context-aware AI hosting support agent that can answer informational questions, debug complex server issues, and suggest application code optimizations for speed and security. 28 of the 32 patterns apply. The other four don’t, either because they require self-hosted models, add cost without proportional benefit, or belong to a separate system that is out of scope for what we are building here.

This is a 10-part series. Every post takes one piece of the system, picks the patterns that power it, and shows the code. The implementation uses Google ADK 2.0 and runs against a real hosting platform with real servers and real consequences when the AI gets a diagnosis wrong.

The architecture

The system follows a hub-and-spoke pattern. A Support Router workflow loads customer context, classifies the request, and routes it to the right specialist agent. Each agent owns its own tools, instructions, and domain. Memory is not an agent. It is infrastructure that loads context before routing and persists learnings after the agent responds.

   ┌──────────────────────────────────────────────────────────────────┐
   │                   Support Router (Workflow)                      │
   │                                                                  │
   │  load_context ──▶ classify ──▶ route ──▶ ... ──▶ save_context    │
   └───────────────────────────┬──────────────────────────────────────┘
         ┌───────────┬─────────┼─────────┬───────────┐
         │           │         │         │           │
         ▼           ▼         ▼         ▼           ▼
   ┌──────────┐ ┌─────────┐ ┌─────┐ ┌────────┐ ┌──────────┐
   │Knowledge │ │Diagnostc│ │ Ops │ │Billing │ │Escalation│
   │  Agent   │ │  Agent  │ │Agent│ │ Agent  │ │  Agent   │
   └────┬─────┘ └────┬────┘ └──┬──┘ └───┬────┘ └────┬─────┘
        │            │         │        │           │
        ▼            ▼         ▼        ▼           ▼
   ┌────────┐  ┌─────────┐ ┌───────┐ ┌───────┐ ┌─────────┐
   │  KB    │  │SSH Diag │ │  CW   │ │Billing│ │Intercom │
   │  RAG   │  │  Tool   │ │  API  │ │  API  │ │  API    │
   └────────┘  └─────────┘ └───────┘ └───────┘ └─────────┘
   ┌──────────────────────────────────────────────────────────────────┐
   │                    Shared Infrastructure                         │
   │  Customer Context │ Conv History │ Change Log │ Billing/CRM      │
   └──────────────────────────────────────────────────────────────────┘

The classifier re-runs on every new message. A customer can start with a knowledge question, follow up with a diagnostics request, and then ask about their billing, all in the same conversation. Each message gets reclassified and routed to the right specialist. Session state carries forward, so every agent knows what was already discussed. The customer sees one continuous conversation. Agent boundaries are invisible.

When the AI cannot resolve something, the Escalation Agent creates an Intercom conversation with the full transcript attached, so the human agent picks up exactly where the AI left off. The customer never repeats themselves.

The scaffold

Here is the working scaffold. It compiles, runs, and grows across the series. Tool implementations are stubbed because each one gets its own post.

from google.adk import Agent, Workflow, Event, Context
from google.adk.workflow import node, RetryConfig
from pydantic import BaseModel
from typing import Literal


# --- Context loading and persistence ---

@node(retry_config=RetryConfig(max_attempts=2))
async def load_context(ctx: Context, node_input: str):
    """Pull customer profile and recent history into state."""
    customer_id = ctx.state.get("customer_id")
    if customer_id:
        ctx.state["customer_profile"] = await fetch_customer_profile(customer_id)
        ctx.state["customer_history"] = await fetch_recent_conversations(customer_id)
    return node_input


async def save_context(ctx: Context, node_input: str):
    """Persist what was learned after the agent responds."""
    customer_id = ctx.state.get("customer_id")
    diagnosis = ctx.state.get("diagnosis")
    if customer_id and diagnosis:
        await persist_interaction(
            customer_id=customer_id,
            issue=diagnosis,
            resolution=node_input,
        )
    return node_input


# --- Tools (specifics come in later posts) ---

def search_knowledge_base(query: str):
    """Search the hosting knowledge base for relevant articles."""
    ...

def check_server_status(server_id: str):
    """CPU, memory, disk, and service health for a server."""
    ...

def get_server_logs(server_id: str, service: str, lines: int = 100):
    """Recent log entries for a service."""
    ...

def get_invoices(customer_id: str):
    """Recent invoices and payment status."""
    ...

def restart_service(server_id: str, service: str):
    """Restart a service. Confirmation gated by a callback."""
    ...

async def create_intercom_conversation(email: str, transcript: str):
    """Hand off to a human via Intercom with the full transcript attached."""
    ...


# --- Specialist agents ---

knowledge_agent = Agent(
    name="knowledge_agent",
    model="gemini-2.5-flash",
    instruction=(
        "Answer the hosting question using the knowledge base.\n"
        "Customer: {customer_profile}\n"
        "Always cite the specific KB article. "
        "If you cannot find the answer, say so."
    ),
    tools=[search_knowledge_base],
)

diagnostics_agent = Agent(
    name="diagnostics_agent",
    model="gemini-2.5-pro",
    instruction=(
        "Diagnose this server issue step by step.\n"
        "Customer: {customer_profile}\n"
        "Check server status first, then relevant logs, then correlate "
        "findings before suggesting a fix."
    ),
    tools=[check_server_status, get_server_logs],
)

ops_agent = Agent(
    name="ops_agent",
    model="gemini-2.5-flash",
    instruction=(
        "Execute approved server operations.\n"
        "Customer: {customer_profile}\n"
        "Never run destructive operations without confirmation."
    ),
    tools=[restart_service],
)

billing_agent = Agent(
    name="billing_agent",
    model="gemini-2.5-flash",
    instruction=(
        "Handle billing and plan questions.\n"
        "Customer: {customer_profile}\n"
        "Look up invoices and plan details. "
        "Never process refunds or plan changes. Escalate those."
    ),
    tools=[get_invoices],
)

escalation_agent = Agent(
    name="escalation_agent",
    model="gemini-2.5-flash",
    instruction=(
        "Hand the conversation off to a human support engineer.\n"
        "Customer: {customer_profile}\n"
        "Create an Intercom conversation and attach the full transcript."
    ),
    tools=[create_intercom_conversation],
)


# --- Intent classifier and router ---

class Intent(BaseModel):
    intent: Literal["KNOWLEDGE", "DIAGNOSTICS", "OPS", "BILLING", "ESCALATION"]
    urgency: Literal["LOW", "MEDIUM", "HIGH"]


classify = Agent(
    name="classify",
    model="gemini-2.5-flash",
    instruction=(
        "Classify this hosting support request.\n"
        "Customer: {customer_profile}\n"
        "KNOWLEDGE: how-to and configuration questions.\n"
        "DIAGNOSTICS: performance issues, errors, downtime.\n"
        "OPS: restart services, clear cache, apply approved changes.\n"
        "BILLING: invoices, plans, payments.\n"
        "ESCALATION: customer asks for a human."
    ),
    output_schema=Intent,
    output_key="classification",
)


def route(classification: Intent):
    yield Event(route=classification.intent)


# --- Workflow ---

root_agent = Workflow(
    name="support_router",
    edges=[
        ("START", load_context, classify, route),
        (route, {
            "KNOWLEDGE": knowledge_agent,
            "DIAGNOSTICS": diagnostics_agent,
            "OPS": ops_agent,
            "BILLING": billing_agent,
            "ESCALATION": escalation_agent,
        }),
        (knowledge_agent, save_context),
        (diagnostics_agent, save_context),
        (ops_agent, save_context),
        (billing_agent, save_context),
        (escalation_agent, save_context),
    ],
)

A few things worth pointing out:

load_context and save_context are plain Python nodes, not agents. They run on every conversation. Loading happens before the classifier so every downstream agent has the customer profile in state, available via the {customer_profile} placeholder in instructions. Saving happens after the agent responds, so the next conversation starts with the previous one’s outcome already persisted.

classify uses output_schema=Intent to force a typed response. output_key="classification" writes that response into session state under the name classification, which the route function picks up by parameter name. route then yields an Event(route=...), and the workflow uses the dict in the next edge to pick the right specialist.

The @node(retry_config=...) decorator handles the case where the customer profile API is briefly unavailable. ADK 2.0 catches the exception, waits, and retries automatically. Each agent can have its own retry policy, which becomes important once you have five of them and a few external APIs in the mix.

All 32 patterns and where they fit

#PatternUsed?Where it fits
P1Logits MaskingNoRequires self-hosted models.
P2GrammarYesStructured JSON for intent classification, diagnostic reports, escalation payloads.
P3Style TransferYesFormal for enterprise customers, conversational for small business owners.
P4Reverse NeutralizationYesOne diagnostic finding, restyled for the customer vs. for the human handoff note.
P5Content OptimizationYesPreference tuning to figure out which response templates actually resolve conversations faster.
P6Basic RAGYesGrounding answers in KB articles, platform docs, and resolved conversations.
P7Semantic IndexingYes“My WordPress keeps crashing” should match PHP memory and OPcache articles, not anything containing “crash”.
P8Indexing at ScaleYesKB articles get updated, new platform features launch, deprecated procedures need to be removed.
P9Index-Aware RetrievalYesHyDE for vague queries, GraphRAG for connecting related issues.
P10Node PostprocessingYesReranking, deduplication, filtering by stack before the LLM sees any context.
P11Trustworthy GenerationYesCitations to specific KB articles, plus confidence detection.
P12Deep SearchYesMulti-hop retrieval when the root cause is three levels deep.
P13Chain of ThoughtYesStep-by-step diagnostics. Check status, examine logs, correlate metrics, narrow down.
P14Tree of ThoughtsNoChain of Thought covers the main case. ToT adds cost without proportional benefit.
P15Adapter TuningYesFine-tuning for hosting domain vocabulary.
P16Evol-InstructYesGenerates training data from resolved conversations without manual labeling.
P17LLM-as-JudgeNoBelongs to a separate QA system that runs after conversations close. Out of scope here.
P18ReflectionYesDraft, critique, revise for responses where wrong technical details can cause data loss.
P19Dependency InjectionYesSwap models and mock server APIs so you can test without touching real infrastructure.
P20Prompt OptimizationYesSystematic updates when models change or new features need updated instructions.
P21Tool CallingYesCloudways API and an SSH diagnostic tool for server management, deployment, DNS, SSL, backups.
P22Code ExecutionYesDiagnostic scripts. Parse log files, analyze slow query logs, check connection counts.
P23Multiagent CollaborationYesThe Support Router itself. Specialists for knowledge, diagnostics, ops, billing, escalation.
P24Small Language ModelYesCheap, fast models for intent classification and routing.
P25Prompt CachingYesThe same 50 questions about DNS, SSL, and PHP versions account for a large chunk of traffic.
P26Inference OptimizationNoContinuous batching and speculative decoding only matter if you self-host.
P27Degradation TestingYesMeasures response quality under load, not just whether the system stays up.
P28Long-Term MemoryYesRemembers customer infrastructure across sessions. Implemented as load_context and save_context.
P29Template GenerationYesStructured diagnostic reports the LLM fills in but cannot fabricate.
P30Assembled ReformatYesPull server facts from the API. Let the LLM only handle explanation and formatting.
P31Self-CheckYesLog-probabilities to flag uncertain diagnostic claims. Surface the uncertainty instead of guessing.
P32GuardrailsYesBlocks destructive operations, validates server IDs, prevents the agent from doing things it shouldn’t.

The four skips are P1, P14, P17, and P26. Two of them require self-hosting. One adds cost without proportional benefit. One belongs to a different system.

What’s coming

Each post in the series picks one agent and the patterns that power it:

PostFocusPatterns
1Overview (this post)All 32 mapped
2Knowledge Agent and RAGBasic RAG, Semantic Indexing, Node Postprocessing, Trustworthy Generation
3Response quality and templatesStyle Transfer, Template Generation
4Safety for server operationsSelf-Check, Guardrails
5Diagnostic reasoningChain of Thought, Reflection
6MCP tools and script executionTool Calling, Code Execution
7Multi-agent coordinationMultiagent Collaboration
8Customer context memoryLong-Term Memory
9Testing and reliabilityDependency Injection, Degradation Testing
10Full architectureAll 28 patterns composed

Every post has working code, the tradeoffs I actually ran into building this against live hosting infrastructure, and the places where things didn’t work the first time.

Thanks for reading.