<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>System Design on AI Brew</title>
    <link>https://aibrew.ai/tags/system-design/</link>
    <description>Recent content in System Design on AI Brew</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 25 May 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://aibrew.ai/tags/system-design/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>RAG vs Agents: When to Use Which (With Real Examples from Our Stack)</title>
      <link>https://aibrew.ai/2026/05/rag-vs-agents-when-to-use-which-with-real-examples-from-our-stack/</link>
      <pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate>
      <guid>https://aibrew.ai/2026/05/rag-vs-agents-when-to-use-which-with-real-examples-from-our-stack/</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — RAG answers from documents. Agents take actions. Most real systems use both: RAG provides context, agents act on it. The hard part isn&amp;rsquo;t picking one — it&amp;rsquo;s knowing which layer of your problem belongs to which pattern.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id=&#34;why-this-comparison-matters-right-now&#34;&gt;Why This Comparison Matters Right Now&lt;/h2&gt;
&lt;p&gt;Two things happened in the last six months that make this comparison less academic than it used to be.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;: coding agents crossed a quality threshold around November 2025. Simon Willison&amp;rsquo;s &lt;a href=&#34;https://simonwillison.net/2026/May/19/5-minute-llms/&#34;&gt;five-minute PyCon talk&lt;/a&gt; describes it as the moment agents went from &amp;ldquo;often-work&amp;rdquo; to &amp;ldquo;mostly-work&amp;rdquo; — usable as daily drivers, not just demos. The &amp;ldquo;best model&amp;rdquo; title changed hands five times between Anthropic, OpenAI, and Google in a single month.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<blockquote>
<p><strong>TL;DR</strong> — RAG answers from documents. Agents take actions. Most real systems use both: RAG provides context, agents act on it. The hard part isn&rsquo;t picking one — it&rsquo;s knowing which layer of your problem belongs to which pattern.</p>
</blockquote>
<hr>
<h2 id="why-this-comparison-matters-right-now">Why This Comparison Matters Right Now</h2>
<p>Two things happened in the last six months that make this comparison less academic than it used to be.</p>
<p><strong>First</strong>: coding agents crossed a quality threshold around November 2025. Simon Willison&rsquo;s <a href="https://simonwillison.net/2026/May/19/5-minute-llms/">five-minute PyCon talk</a> describes it as the moment agents went from &ldquo;often-work&rdquo; to &ldquo;mostly-work&rdquo; — usable as daily drivers, not just demos. The &ldquo;best model&rdquo; title changed hands five times between Anthropic, OpenAI, and Google in a single month.</p>
<p><strong>Second</strong>: the model labs themselves are pivoting. Greg Brockman: <em>&ldquo;the model alone is no longer the product.&rdquo;</em> AI21 shuttered its model team to focus on agents. DeepSeek spun up its first &ldquo;Harness team.&rdquo; <a href="https://www.latent.space/p/ainews-all-model-labs-are-now-agent">Latent Space called this</a> <em>&ldquo;all model labs are now agent labs.&rdquo;</em></p>
<p>When the people who train the models start saying the model isn&rsquo;t the product, the question of <em>how</em> you wire models into systems becomes the actual engineering work. RAG and agents are the two dominant answers. They solve different problems, and getting the choice wrong wastes a lot of tokens.</p>
<hr>
<h2 id="the-mental-model">The Mental Model</h2>
<h3 id="rag-retrieve-then-generate">RAG: Retrieve, then Generate</h3>
<p>RAG is a fixed four-step pipeline:</p>
<pre tabindex="0"><code>User query
   │
   ▼
Embedding model → vector
   │
   ▼
Vector DB / search index → top-K relevant chunks
   │
   ▼
Chunks injected into the LLM prompt as context
   │
   ▼
LLM writes one answer, grounded in the retrieved text
</code></pre><p>One retrieval. One generation. Cheap, deterministic, easy to debug.</p>
<h3 id="agent-reason-then-act-then-reason-again">Agent: Reason, then Act, then Reason Again</h3>
<p>Agent is a reasoning loop:</p>
<pre tabindex="0"><code>User goal
   │
   ▼
┌──────────────────────────────────────────┐
│   LLM reads the goal                      │
│   ↓                                       │
│   Picks a tool (Read, Edit, Bash, ...)    │
│   ↓                                       │
│   Runtime executes the tool               │
│   ↓                                       │
│   Result feeds back to the LLM            │
│   ↓                                       │
│   LLM reasons about what to do next       │
│   ↓                                       │
│   Picks the next tool                     │
│   ↓                                       │
│   ...loop until task is done              │
└──────────────────────────────────────────┘
</code></pre><p>Every iteration burns tokens. Every step can fail. Errors compound across the loop.</p>
<hr>
<h2 id="a-concrete-example-of-each">A Concrete Example of Each</h2>
<h3 id="rag-in-action-semantic-wiki-search">RAG in Action: Semantic Wiki Search</h3>
<p>We run a personal knowledge base — about 60 markdown files covering project notes, design decisions, and conversation transcripts. Plain <code>grep</code> doesn&rsquo;t cut it because the question and the answer rarely share keywords.</p>
<p>The solution is an MCP server that wraps a vector search:</p>
<pre tabindex="0"><code>MCP server: wiki-search
  Backend: bge-m3 embedding model
  Storage: cosine similarity index over 60+ markdown files
  Input:   natural-language query (English or Chinese)
  Output:  file path + section title + similarity score
</code></pre><p>When I ask Claude Code <em>&ldquo;what did we decide about the auto-pipeline last month?&rdquo;</em>, this happens:</p>
<ol>
<li>Claude Code recognizes it needs to search the wiki and invokes <code>search_wiki(&quot;auto-pipeline decision&quot;)</code></li>
<li>The query gets embedded into a 1024-dim vector</li>
<li>Cosine similarity returns the top 5 matching sections</li>
<li>The matching chunks get injected into Claude&rsquo;s context</li>
<li>Claude answers, grounded in the actual files — no hallucination</li>
</ol>
<p>This is RAG end-to-end. The wiki is <strong>passive</strong>. It gets queried; it doesn&rsquo;t act.</p>
<h3 id="agent-in-action-claude-code-editing-files">Agent in Action: Claude Code Editing Files</h3>
<p>Same Claude Code instance. Different problem: <em>&ldquo;Update the baseURL in mybrew/hugo.yaml to aibrew.ai and add a note in TODO.md.&rdquo;</em></p>
<pre tabindex="0"><code>Iteration 1:
  LLM reasoning: &#34;I need to see the current config first.&#34;
  Tool: Read(&#34;mybrew/hugo.yaml&#34;)
  Result: file contents shown

Iteration 2:
  LLM reasoning: &#34;baseURL is on line 1. I&#39;ll edit it.&#34;
  Tool: Edit(old=&#34;https://mybrew.cc/&#34;, new=&#34;https://aibrew.ai/&#34;)
  Result: edit succeeded

Iteration 3:
  LLM reasoning: &#34;Now update TODO.md to mark this done.&#34;
  Tool: Read(&#34;hugo/TODO.md&#34;)
  Result: file contents shown

Iteration 4:
  LLM reasoning: &#34;Need to add the entry under &#39;Domain config&#39;.&#34;
  Tool: Edit(...)
  Result: edit succeeded

Task complete.
</code></pre><p>Four iterations. Four tool calls. Multiple reasoning steps. The agent decided <em>what</em> to do, <em>how</em> to do it, and <em>when</em> it was done — all on its own.</p>
<h3 id="a-higher-stakes-agent-game-server-control">A Higher-Stakes Agent: Game Server Control</h3>
<p>We also run an agent that controls a Terraria game server through MCP — the bridge exposes ~40 tools (give items, teleport, ban players, spawn bosses, restart server).</p>
<pre tabindex="0"><code>Player in chat: &#34;@ai give me a Zenith&#34;
  → terra_item_lookup(&#34;Zenith&#34;) → resolves to ID 4956
  → terra_give_item(player=&#34;kali&#34;, item=&#34;Zenith&#34;) → SUCCESS
  → Item appears in player&#39;s inventory
</code></pre><p>Compare to a destructive operation:</p>
<pre tabindex="0"><code>Player: &#34;@ai end the world&#34;
  → terra_world_hardmode(confirm=true) requires explicit authorization
  → Refuses without confirmation
  → If confirmed: world permanently enters hardmode (irreversible)
</code></pre><p>This is where the agent pattern gets dangerous. The LLM is now in the driver&rsquo;s seat of a real system. <strong>The blast radius of a wrong tool call is no longer &ldquo;wrong answer&rdquo; — it&rsquo;s &ldquo;wrecked world.&rdquo;</strong> Permission boundaries become first-class design.</p>
<hr>
<h2 id="the-decision-framework">The Decision Framework</h2>
<p>The one-line rule:</p>
<blockquote>
<p><strong>Use RAG when the answer lives in your documents. Use an agent when the answer requires action.</strong></p>
</blockquote>
<p>Here&rsquo;s the longer version:</p>
<table>
  <thead>
      <tr>
          <th>Dimension</th>
          <th>RAG</th>
          <th>Agent</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Goal</strong></td>
          <td>Answer a question</td>
          <td>Complete a task</td>
      </tr>
      <tr>
          <td><strong>Interaction model</strong></td>
          <td>One-shot</td>
          <td>Multi-turn loop</td>
      </tr>
      <tr>
          <td><strong>Token cost</strong></td>
          <td>Low (1× retrieval + 1× generation)</td>
          <td>High (N× reasoning + N× tool calls)</td>
      </tr>
      <tr>
          <td><strong>Latency</strong></td>
          <td>~1–3 seconds</td>
          <td>Seconds to minutes</td>
      </tr>
      <tr>
          <td><strong>Determinism</strong></td>
          <td>High — same query → similar answer</td>
          <td>Low — same goal → different paths</td>
      </tr>
      <tr>
          <td><strong>Debuggability</strong></td>
          <td>Inspect retrieval results</td>
          <td>Trace each reasoning step</td>
      </tr>
      <tr>
          <td><strong>Failure mode</strong></td>
          <td>Wrong/missing context → bad answer</td>
          <td>Tool error compounds → drift</td>
      </tr>
      <tr>
          <td><strong>Blast radius</strong></td>
          <td>Limited to wrong answer</td>
          <td>Touches real systems</td>
      </tr>
      <tr>
          <td><strong>Best for</strong></td>
          <td>Q&amp;A, search, summarization</td>
          <td>Coding, ops, automation, workflows</td>
      </tr>
  </tbody>
</table>
<h3 id="when-you-definitely-want-rag">When You Definitely Want RAG</h3>
<ul>
<li><em>&ldquo;What does our internal API documentation say about rate limits?&rdquo;</em></li>
<li><em>&ldquo;Summarize last week&rsquo;s customer feedback.&rdquo;</em></li>
<li><em>&ldquo;What did the design discussion conclude about authentication?&rdquo;</em></li>
</ul>
<h3 id="when-you-definitely-want-an-agent">When You Definitely Want an Agent</h3>
<ul>
<li><em>&ldquo;Run the test suite and fix any failures.&rdquo;</em></li>
<li><em>&ldquo;Pull yesterday&rsquo;s unread RSS items, pick the three most interesting, and draft a roundup post.&rdquo;</em></li>
<li><em>&ldquo;Refactor this directory to use the new logging API.&rdquo;</em></li>
</ul>
<h3 id="when-you-need-both-most-real-systems">When You Need Both (Most Real Systems)</h3>
<ul>
<li><em>&ldquo;Find the related design doc, then propose a code change consistent with it.&rdquo;</em>
→ RAG to retrieve the doc, agent to make the change.</li>
<li><em>&ldquo;Look up how Pinterest handled MCP auth, then design our auth layer.&rdquo;</em>
→ RAG to gather references, agent to write code.</li>
</ul>
<hr>
<h2 id="hybrid-patterns-rag-powered-agents">Hybrid Patterns: RAG-Powered Agents</h2>
<p>Here&rsquo;s the thing most &ldquo;RAG vs Agent&rdquo; comparisons gloss over: <strong>inside any real agent, RAG is happening at multiple layers</strong>.</p>
<p>A Claude Code session, simplified:</p>
<pre tabindex="0"><code>Session start:
  └─ Load CLAUDE.md into context ............... RAG-on-startup
  └─ Load relevant MEMORY.md files ............. RAG-on-startup

User query:
  └─ Agent reasons about the goal
       │
       ├─ Tool call: search_wiki(&#34;...&#34;) ........ RAG-on-demand
       ├─ Tool call: searxng_web_search(&#34;...&#34;) . RAG-on-demand
       ├─ Tool call: Read(&#34;config.yaml&#34;) ....... Deterministic retrieval
       └─ Tool call: Edit(...) ................. Action
</code></pre><p>The agent loop is the outer shell. RAG calls happen <em>inside</em> the loop, on demand, whenever the agent decides it needs more grounding.</p>
<p>This matches what Pinterest engineers describe in their MCP rollout: the agent surfaces (chat, IDE, CLI) all talk to a common set of MCP servers, some of which are pure retrieval (Presto query, doc search) and some of which are actions (file a ticket, restart a job). The agent decides at runtime which to call.</p>
<hr>
<h2 id="production-case-study-pinterests-mcp-ecosystem">Production Case Study: Pinterest&rsquo;s MCP Ecosystem</h2>
<p>ByteByteGo&rsquo;s writeup of <a href="https://blog.bytebytego.com/p/how-pinterest-built-a-production">Pinterest&rsquo;s MCP rollout</a> is one of the few public production stories.</p>
<h3 id="the-nm-problem">The N×M Problem</h3>
<p>Pinterest engineers work across many systems daily — Presto for data, Spark for batch jobs, Airflow for workflows, internal docs, ticketing. They wanted AI agents that could reach into these systems directly.</p>
<p>The brute-force math:</p>
<pre tabindex="0"><code>5 agent surfaces × 10 internal tools = 50 bespoke integrations
</code></pre><p>Every new surface or new tool multiplied the work. Plus 50 auth flows, 50 token lifecycles, 50 sets of plumbing.</p>
<h3 id="the-mcp-bet">The MCP Bet</h3>
<p>The Model Context Protocol promised to flatten this:</p>
<pre tabindex="0"><code>5 clients + 10 servers = 15 standardized integrations
</code></pre><p>One protocol, used in both directions. Build a client per surface. Wrap each tool in a server. They all speak the same language.</p>
<h3 id="what-mcp-doesnt-solve">What MCP Doesn&rsquo;t Solve</h3>
<p>Pinterest&rsquo;s hard-won lesson: the protocol is the easy part. The real engineering went into the <em>surrounding</em> infrastructure:</p>
<table>
  <thead>
      <tr>
          <th>Concern</th>
          <th>Pinterest&rsquo;s Solution</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Discovery</strong></td>
          <td>Central registry of MCP servers — name, version, owner, endpoint</td>
      </tr>
      <tr>
          <td><strong>Auth (Layer 1)</strong></td>
          <td>Service identity — which agent runtime is making this call</td>
      </tr>
      <tr>
          <td><strong>Auth (Layer 2)</strong></td>
          <td>User identity — whose permissions is the agent acting under</td>
      </tr>
      <tr>
          <td><strong>Deployment</strong></td>
          <td>Unified CI/CD pipeline for all MCP servers</td>
      </tr>
      <tr>
          <td><strong>Observability</strong></td>
          <td>Tool-call metrics from day one — usage, latency, error rate</td>
      </tr>
  </tbody>
</table>
<p>The takeaway: <strong>the more capable your agents become, the more your permission and observability layers matter.</strong> A protocol that lets any agent call any tool is also a protocol that lets any compromised agent call any tool.</p>
<p>This is also why our smaller setup (3 MCP servers: <code>searxng</code>, <code>wiki-search</code>, <code>terra_llm_bridge</code>) puts hard <code>confirm=true</code> gates on destructive operations like banning players, restarting the world, or enabling hardmode. Three servers don&rsquo;t need a registry — but they do need authorization.</p>
<hr>
<h2 id="architecture-comparison-claude-code-vs-openclaw">Architecture Comparison: Claude Code vs OpenClaw</h2>
<p>Two of the most popular agent harnesses today take very different stances. ByteByteGo&rsquo;s <a href="https://blog.bytebytego.com/p/ep214-claude-code-vs-openclaw-5-design">EP214</a> breaks them down on five dimensions:</p>
<h3 id="1-system-scope">1. System Scope</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th>Claude Code</th>
          <th>OpenClaw</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Lifetime</td>
          <td>Short-lived process</td>
          <td>Long-running daemon</td>
      </tr>
      <tr>
          <td>Trigger</td>
          <td>User runs CLI</td>
          <td>WebSocket from Discord/Slack/WhatsApp</td>
      </tr>
      <tr>
          <td>Exit</td>
          <td>After task complete</td>
          <td>Never</td>
      </tr>
  </tbody>
</table>
<p>Claude Code is a workhorse you summon. OpenClaw is a butler that&rsquo;s always listening.</p>
<h3 id="2-agent-runtime">2. Agent Runtime</h3>
<ul>
<li><strong>Claude Code</strong>: single async loop — <code>Think → Tool Call → Observe → Repeat</code>. One task at a time per process.</li>
<li><strong>OpenClaw</strong>: per-session queues. The Gateway demultiplexes incoming messages and dispatches them to separate runtime queues.</li>
</ul>
<h3 id="3-extension-model">3. Extension Model</h3>
<ul>
<li><strong>Claude Code</strong>: Four extension primitives, all hooking into the same agent loop:
<ul>
<li><strong>MCP</strong> (external tool servers)</li>
<li><strong>Plugins</strong> (bundled tool sets)</li>
<li><strong>Skills</strong> (named procedures the model can invoke)</li>
<li><strong>Hooks</strong> (event-driven shell commands)</li>
</ul>
</li>
<li><strong>OpenClaw</strong>: Manifest-first plugins. All plugins go through a central Registry before being made available to the Agent.</li>
</ul>
<h3 id="4-memory">4. Memory</h3>
<ul>
<li><strong>Claude Code</strong>: <code>CLAUDE.md</code> loaded into context at session start. Subdirectories have their own <code>CLAUDE.md</code> that gets appended when you <code>cd</code> into them.</li>
<li><strong>OpenClaw</strong>: <code>MEMORY.md</code> separated from daily notes. Hybrid vector + keyword search across structured sections.</li>
</ul>
<h3 id="5-multi-agent-topology">5. Multi-Agent Topology</h3>
<ul>
<li><strong>Claude Code</strong>: Lead → subagent pattern. Main agent delegates work to spawned subagents.</li>
<li><strong>OpenClaw</strong>: Route-and-delegate. Inbound channels route to dedicated agents that hand off to shared subagents.</li>
</ul>
<p>The deeper pattern: <strong>Claude Code optimizes for &ldquo;one session, one task.&rdquo;</strong> OpenClaw optimizes for &ldquo;many concurrent conversations, ambient presence.&rdquo; Both are correct for their respective use cases. Don&rsquo;t pick the wrong one for yours.</p>
<hr>
<h2 id="failure-modes-and-anti-patterns">Failure Modes and Anti-Patterns</h2>
<h3 id="rag-failure-modes">RAG Failure Modes</h3>
<p><strong>1. Retrieval misses the relevant chunk.</strong> Your embedding model thinks the question and the answer are semantically distant when they aren&rsquo;t. Mitigation: hybrid search (vector + keyword), reranking, query expansion.</p>
<p><strong>2. Retrieval returns too many irrelevant chunks.</strong> Context window fills with noise. Mitigation: stricter top-K, similarity threshold, post-retrieval filtering.</p>
<p><strong>3. The answer isn&rsquo;t actually in your corpus.</strong> RAG can&rsquo;t fabricate truth — if the knowledge isn&rsquo;t indexed, the model still doesn&rsquo;t know. Mitigation: a confidence check, or a fallback to web search.</p>
<p><strong>4. Chunking destroyed the structure.</strong> You split a markdown file mid-table, mid-code-block, mid-argument. Mitigation: structure-aware chunking (by heading, by paragraph, by semantic unit).</p>
<h3 id="agent-failure-modes">Agent Failure Modes</h3>
<p><strong>1. Reasoning drift.</strong> The agent gets stuck in a loop, repeatedly trying variations of the same failed approach. Mitigation: max-step limits, distinct-tool-call constraints, explicit &ldquo;what have I tried&rdquo; memory.</p>
<p><strong>2. Permission overreach.</strong> The agent does too much. It was asked to fix one test, it refactored half the file. Mitigation: explicit scope in the prompt, narrow tool permissions, human-in-the-loop for destructive ops.</p>
<p><strong>3. Tool-call cascade failure.</strong> A single bad tool call (e.g., a malformed path) gets followed by five reasoning steps trying to &ldquo;fix&rdquo; the symptom rather than the root cause. Mitigation: clear error messages from tools, &ldquo;try once then escalate&rdquo; tool design.</p>
<p><strong>4. Spending money on the wrong thing.</strong> A 20-step agent loop costs 20× a single LLM call. If RAG would have answered the question, you just paid 20× to get a worse answer. Mitigation: ask &ldquo;could this be a single retrieval?&rdquo; before going to agent mode.</p>
<h3 id="the-worst-anti-pattern-agent-when-rag-works">The Worst Anti-Pattern: Agent-When-RAG-Works</h3>
<p>The single most expensive mistake teams make: building an agent for a problem that&rsquo;s actually a search problem.</p>
<p>If your users are asking <em>&ldquo;where in the docs does it say…&rdquo;</em>, you don&rsquo;t need an agent. You need a search box wired to a vector index. Stop spending tokens on multi-step reasoning to find something a single retrieval call would surface.</p>
<hr>
<h2 id="what-this-means-for-builders">What This Means for Builders</h2>
<p>A practical checklist if you&rsquo;re starting a new AI feature:</p>
<ol>
<li><strong>Frame the problem as a verb.</strong> <em>&ldquo;Answer questions about X&rdquo;</em> → RAG. <em>&ldquo;Do X on behalf of the user&rdquo;</em> → agent.</li>
<li><strong>If you can answer it with one retrieval, do.</strong> Cheaper, faster, more predictable.</li>
<li><strong>If you go agent, design permissions on day one.</strong> Not day fifty. Pinterest&rsquo;s two-layer auth wasn&rsquo;t a feature — it was a survival requirement.</li>
<li><strong>Plan for hybrid.</strong> Real agents will need RAG-style retrieval inside their loop. Pick a protocol (MCP is the obvious default) and stick to it.</li>
<li><strong>Instrument everything.</strong> Tool call counts, retrieval hit rates, drift indicators. You can&rsquo;t tune what you can&rsquo;t see.</li>
<li><strong>Set a budget per task.</strong> Both in tokens and in iterations. Agents without budgets find creative ways to spend forever on the wrong thing.</li>
</ol>
<hr>
<h2 id="closing-thought">Closing Thought</h2>
<p>The RAG-versus-agent framing made sense in 2023, when these were two distinct paradigms competing for the same job. In 2026, they&rsquo;re complementary layers of the same system.</p>
<p>The interesting question isn&rsquo;t <em>which one to use</em>. It&rsquo;s <em>which slice of your problem belongs in which layer</em>. Get that division right and you ship something useful. Get it wrong and you&rsquo;ll spend a quarter rebuilding it.</p>
<p>For most teams shipping today, the answer looks like this:</p>
<pre tabindex="0"><code>                ┌───────────────────────────────┐
                │      Agent loop (outer)        │
                │   reasoning + tool selection   │
                └──────────┬────────────────────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
        ▼                  ▼                  ▼
   RAG retrieval     Action tools       Computation
   (knowledge)       (mutate state)     (math, code)
</code></pre><p>Agent decides. RAG informs. Tools act. That&rsquo;s the whole stack.</p>
<hr>
<p><em>References</em></p>
<ul>
<li><em><a href="https://blog.bytebytego.com/p/ep216-rags-vs-agents">ByteByteGo EP216 — RAGs vs Agents</a></em></li>
<li><em><a href="https://blog.bytebytego.com/p/how-pinterest-built-a-production">ByteByteGo — How Pinterest Built a Production MCP Ecosystem</a></em></li>
<li><em><a href="https://blog.bytebytego.com/p/ep214-claude-code-vs-openclaw-5-design">ByteByteGo EP214 — Claude Code vs. OpenClaw: 5 Design Dimensions</a></em></li>
<li><em><a href="https://simonwillison.net/2026/May/19/5-minute-llms/">Simon Willison — The Last Six Months in LLMs in Five Minutes</a></em></li>
<li><em><a href="https://www.latent.space/p/ainews-all-model-labs-are-now-agent">Latent.Space — All Model Labs Are Now Agent Labs</a></em></li>
</ul>
]]></content:encoded>
    </item>
  </channel>
</rss>
