A practical framework to think about modern AI like chemistry — reusable components (LLMs, RAG, agents, guardrails) that combine into predictable reference architectures, including MCP-based tool integration.
Connect: LinkedIn • GitHub • Website
Inspiration / credit: This work is inspired by the IBM-style “AI Periodic Table” explained by Martin Keen (IBM). Video:
TL;DR
Real-world AI products are not “just an LLM.” They are systems assembled from reusable components—like chemistry. A periodic table is valuable not for memorization, but for predicting reactions: which parts combine well, where failures happen, and what validation makes systems trustworthy.
AI discussions often reduce everything to a single ingredient: the model.
But production systems need more than raw generation capability:
This periodic-table framing gives teams a shared language to design systems as compositions of components—with predictable interactions.
A practical system is a reaction that combines elements across columns (often across several rows).
The table helps you ask: Which ingredients do we need to achieve a specific product outcome under constraints (latency, cost, security, compliance)?
| Row | Group 1 (Reactive) | Group 2 (Retrieval) | Group 3 (Orchestration) | Group 4 (Validation) | Group 5 (Models) |
|---|---|---|---|---|---|
| Row 1 — Primitive | Pr Prompts | Em Embeddings | Ch Prompt Chaining | Sc Schemas & Constraints | Lg LLMs |
| Row 2 — Composition | Fc Function Calling | Vx Vector Databases | Rg RAG | Gr Guardrails | Mm Multimodal Models |
| Row 3 — Deployment | Ag Agents | Ft Fine-tuning | Fw Frameworks | Rt Red Teaming | Sm Small Models |
| Row 4 — Emerging | Ma Multi-agent Systems | Sy Synthetic Data | MCP MCP Servers & Protocols | In Interpretability | Th Thinking Models |
Filled blocks in this version:
Each element includes: what it is, when to use it, typical risks, and practical notes.
In real systems, elements are usually combined rather than used alone.
What: Natural language instructions that steer model behavior.
Use when: You need a fast prototype or a controllable behavior layer.
Risks: Prompt injection, brittle phrasing, hidden assumptions.
Practical notes: Treat prompts as versioned artifacts with review + regression tests.
What: Vector representations capturing semantic similarity.
Use when: You need search, clustering, deduplication, or retrieval.
Risks: Domain mismatch, privacy leakage if embeddings are exposed.
Practical notes: Choose embedding models that match your language + domain, and apply access control.
What: Multi-step prompting (decompose → draft → critique → refine) and reusable templates.
Use when: You want more reliable results than a single prompt.
Risks: Latency, compounding errors, debugging complexity.
Practical notes: Keep chains short, add checkpoints, and prefer structured outputs between steps.
What: Enforcing output formats (e.g., JSON schema), allowed values, and validation checks.
Use when: Downstream systems require reliable structure.
Risks: Over-constraint can hurt answer quality; schema drift.
Practical notes: Fail fast, return actionable validation errors, and log violations.
What: General-purpose language models for generation, reasoning, extraction.
Use when: You need broad language capability.
Risks: Hallucinations, bias, prompt sensitivity.
Practical notes: For high-stakes use, pair with retrieval, constraints, and evaluation.
What: A controlled interface for models/agents to request tool execution.
Use when: You need real-time data, actions, or computation.
Risks: Tool misuse, insecure parameter passing, infinite tool loops.
Practical notes: Validate arguments, sandbox tools, rate-limit calls, and log everything.
What: Systems optimized to store/search embeddings at scale.
Use when: You need fast semantic retrieval over large corpora.
Risks: Stale indexes, poor chunking, access control mistakes.
Practical notes: Use document-level ACLs, monitor retrieval hit-rate and relevance.
What: Retrieve relevant context, then generate grounded outputs.
Use when: Knowledge changes often, or you need citations/traceability.
Risks: Wrong retrieval → confident wrong answers; leakage of sensitive docs.
Practical notes: Evaluate retrieval separately (recall/precision) and generation (faithfulness).
What: Runtime policies and safety controls (filters, PII redaction, topic restrictions, refusal rules).
Use when: You must enforce security, privacy, or compliance.
Risks: Over-blocking harms UX; under-blocking increases risk.
Practical notes: Combine with RBAC + audit logs; keep policy rules testable.
What: Models that understand/generate across text + images/audio/video.
Use when: Your task needs visual or audio context (docs, screenshots, videos).
Risks: Sensitive content in media; higher cost/latency.
Practical notes: Add modality-specific preprocessing and redaction.
What: Systems that plan, act via tools, and observe outcomes in a loop.
Use when: Tasks require multi-step operations with external systems.
Risks: Goal drift, infinite loops, unintended actions.
Practical notes: Add budgets, step limits, approval gates, and strong logging.
What: Adapting a base model using supervised or preference optimization.
Use when: You need stable domain behavior, style, or specialized skill.
Risks: Data leakage, governance complexity, catastrophic forgetting.
Practical notes: Prefer adapters (e.g., LoRA), curate data, keep it auditable.
What: Libraries to build chains/graphs/agents/integrations (e.g., orchestration frameworks).
Use when: You want faster assembly and reusable patterns.
Risks: Abstraction leaks, debugging complexity, lock-in.
Practical notes: Keep core logic portable; test boundaries where tools connect.
What: Adversarial testing (jailbreaks, prompt injection, data exfiltration simulation).
Use when: You deploy to untrusted users or handle sensitive data.
Risks: False confidence if test set is narrow.
Practical notes: Maintain an attack library and rerun after every major change.
What: Distilled/specialized models optimized for cost and latency.
Use when: Edge/on-device or high-throughput settings.
Risks: Capability gaps; may require better retrieval.
Practical notes: Use a cascade: small model first, escalate to larger models if needed.
What: Multiple agents with roles (planner, executor, critic) collaborating.
Use when: Decomposition + cross-checking improves outcomes.
Risks: Coordination overhead; conflicting goals.
Practical notes: Define roles clearly and enforce shared-memory boundaries.
What: Generated data to expand training/eval coverage.
Use when: Real data is scarce, expensive, or privacy-restricted.
Risks: Distribution shift, bias reinforcement.
Practical notes: Validate with human review and real-world holdout benchmarks.
What: Model Context Protocol (MCP) servers expose tools and data sources through a standard interface.
Use when: You want one integration pattern across many tools (files, databases, SaaS APIs).
Risks: Over-permissioned tool access; missing audit trails.
Practical notes: Treat MCP servers like production services: auth, RBAC, logging, rate limits.
What: Techniques to understand and explain model decisions.
Use when: Safety, debugging, or compliance needs transparency.
Risks: Explanations can mislead if not validated.
Practical notes: Combine with counterfactual tests and targeted probes.
What: Models that allocate extra compute to reasoning (deliberation) to improve correctness.
Use when: Tasks are complex and errors are costly.
Risks: Higher latency and cost.
Practical notes: Use selective routing: invoke Th only when confidence is low.
A canonical reaction is a reusable architecture pattern written as a formula.
Each pattern below includes MCP servers as a standard interface for tools and data sources.
Formula: Pr + Em + Vx + Rg + Sc + Gr + MCP + Lg (+ Rt)
Use case: Answer questions using internal documentation with citations and access control.
Typical MCP servers:
Reference flow:
Where it breaks: wrong retrieval causes confident wrong answers
Scale note: evaluate retrieval and generation separately; monitor hit-rate and faithfulness
Formula: Pr + Fc + Ag + Fw + MCP + Sc + Gr + Lg (+ Th)
Use case: Book a flight under constraints (budget, dates) with approvals and safe tool execution.
Typical MCP servers:
Reference flow:
Where it breaks: loops / goal drift / unsafe actions without gating
Scale note: budgets, step limits, timeouts, and approvals are non-negotiable
Formula: Pr + Em + Vx + Rg + Fc + MCP + Sc + Gr + Lg
Use case: Generate safe SQL, run it, and summarize results with definitions + caveats.
Typical MCP servers:
Reference flow:
Where it breaks: wrong joins, misleading causal language
Scale note: add query linting, semantic layer, and golden-test questions
Formula: Pr + Em + Vx + Rg + Fc + Ag + MCP + Sc + Gr (+ Ft)
Use case: Understand a repo, propose changes, run tests, open PRs safely.
Typical MCP servers:
Reference flow:
Where it breaks: prompt injection hidden in README/issues; unsafe tool actions
Scale note: isolate tool instructions; enforce allowlists; scan diffs for secrets
Formula: Pr + Ch + Sc + Gr + Mm (+ Sm)
Use case: Generate/edit images from text while enforcing policy and consistent style.
Typical MCP servers:
Reference flow:
Where it breaks: unsafe prompt requests; reproducibility issues
Scale note: version prompts, seeds, and model versions
Formula: Pr + Mm + Sc + Fc + MCP + Gr + Lg
Use case: Extract structured fields from PDFs/images and write to downstream systems.
Typical MCP servers:
Reference flow:
Where it breaks: adversarial PDFs, schema drift, low-confidence fields
Scale note: keep labeled eval set per template + language
Formula: Pr + Em + Vx + Rg + Fc + Ag + MCP + Sc + Gr + Lg
Use case: Classify tickets, retrieve knowledge, draft responses, update CRM safely.
Typical MCP servers:
Reference flow:
Where it breaks: policy violations and hallucinated promises
Scale note: approvals-first rollout; strict allowed-action playbook
Formula: Pr + Rg + Vx + Fc + Ag + MCP + Gr + Rt + Th
Use case: Investigate alerts, gather evidence, produce incident summary.
Typical MCP servers:
Reference flow:
Where it breaks: attacker-controlled text attempting prompt injection
Scale note: read-only by default; strict tool permissions; audit everything
Formula: Pr + Ma + Ag + Fc + Fw + MCP + Rg + Gr + Th
Use case: Agents gather sources, synthesize, and write a report with traceable citations.
Typical MCP servers:
Reference flow:
Where it breaks: citation fabrication, inconsistent claims across agents
Scale note: enforce “every claim must map to retrieved evidence” rules
Formula: Sm + Em + Vx + Rg + MCP + Sc + Gr (+ Ft)
Use case: Low-latency assistant on-device with local retrieval + optional cloud sync.
Typical MCP servers:
Reference flow:
Where it breaks: limited capability vs larger models
Scale note: cascade routing—use Sm first, escalate only with explicit permission
If you’re building GenAI systems (RAG, agents, multimodal, or evaluation tooling), feel free to reach out: LinkedIn • GitHub • Website
Here are some more articles you might like to read next: