AI Agents from First Principles

Transformers, LLMs, MCP, orchestration, evals — there's too much jargon out there. In this talk, we build a mental model from scratch, starting from the tiniest transformer block all the way to multi-agent systems, so you can understand what's happening and keep up with new releases.

By Pranav Dhingra · March 22, 2026

Tools Used

None - just curiosity!

Content Available

TalkResources

Download slides (PDF) →

This talk was given at EmpireHacks 2026, Cornell Tech's hackathon. There's a problem in the AI landscape today: there's just too much going on. An endless stream of jargon, LinkedIn hype posts, and Twitter threads that make it genuinely difficult to make sense of what's happening. The goal of this talk is to give you a mental model, built from first principles and layer by layer, so that you can parse any new release, any new buzzword, and immediately understand where it fits.

We start from the smallest possible building block and work our way up to multi-agent systems. Think of it as a layer cake: each layer builds on the one below it.


Layer 1: Attention

Attention is the foundation of everything. It's a piece of math that tells the model which words are most relevant when predicting the next word.

Large language models are fundamentally next token prediction systems. But to predict the right next token, the model needs to know which earlier words matter most. In "The cat sat on the mat," the word "cat" is far more important than "the." "The" appears everywhere and carries little meaning. The attention mechanism is what helps the model learn this kind of context.

Instead of naively predicting the next word based on just the previous few words, attention lets the model say: "This part of the sentence, whether it was the last sentence or ten sentences ago, is especially relevant to what I'm about to predict."

Jargon to know:

  • Attention: the mechanism that identifies which tokens matter for the current prediction
  • Weights / Parameters: the learnable variables inside a model that capture what it has learned during training. When someone says a model has "open weights," they mean the trained model is free to download and use.
  • Tokens: the smallest chunks a language model works with. You can think of a token as roughly a word, though words can be broken into sub-word pieces (e.g., "presentation" might become "present" + "ation")

Layer 2: Transformers

Take a bunch of attention mechanisms and stack them together with a feed-forward network, and you get a transformer block.

More attention heads means more weights, which means a bigger "brain." The model can capture more information and understand more context about what came before it. That's really all a transformer is: attention blocks stacked together.

Jargon to know:

  • Transformer: a stack of attention mechanisms; the architecture behind every modern language model
  • Parameters: used interchangeably with weights in casual conversation. "How many parameters does the model have?" is just asking how big its brain is.
  • Neural network: a type of machine learning model; transformers are a specific kind of neural network

Go deeper:


Layer 3: Large Language Models (LLMs)

Take multiple transformer blocks and stack them on top of each other. It just gets bigger and bigger: more learnable parameters, more capacity.

The scale matters:

  • Millions of parameters → can complete sentences reasonably well
  • Billions of parameters → can write coherent paragraphs
  • Hundreds of billions+ → the large language models we use today

Every time you see a new model release ("GPT-5.4 with 7 billion parameters" or "Mistral 7B"), that number is just telling you the size of the model's brain. It's a company's way of saying: look how big our model is, look how much time we spent training it.

The LLM landscape today includes OpenAI (GPT), Anthropic (Claude), Google (Gemini), Meta (Llama, open source), DeepSeek (open source), Mistral, and Grok (xAI).

Jargon to know:

  • LLM: a very large transformer-based language model
  • Pre-training: training the model to do next token prediction really well. It's called "pre" because there are more training phases after this one.
  • Context window: how many tokens the model can process at once. Claude has 200K and 1M token context window models. When you hit the limit, you need to start a new session, clear context, or compact your conversation.
  • Inference: just running a model to predict the next token. That's it. "We're building GPUs for inference" = "We're building GPUs to run models." Don't be intimidated by this word. It's fancy VC lingo at this point.

Layer 4: Post-Training

A model that's great at next token prediction isn't necessarily a great assistant. It's just autocomplete. Post-training is what turns an LLM into something helpful.

Supervised Fine-Tuning (SFT)

You show the model labeled examples of what a good assistant response looks like. This is where the model learns to format responses with a helpful tone, like the "Sure, absolutely!" before answering your question. When someone says "the difference between Claude and ChatGPT is probably just the fine-tuning," this is what they mean: different training data produces different personalities.

Reinforcement Learning from Human Feedback (RLHF)

Humans are shown two model responses and asked which one they prefer. That preference signal gets fed back into the model, steering it toward better responses.

You may have seen this yourself. Apps like Granola or ChatGPT occasionally ask "Which response do you prefer, A or B?" That's them collecting feedback for their RL pipeline.

Getting quality human-labeled data is a genuine problem. Companies pay for expert annotators, but data quality varies widely. There's no perfect solution, and it's something every company is constantly working on.

Jargon to know:

  • Fine-tuning: additional training to specialize a model for a particular task or behavior
  • Supervised fine-tuning: fine-tuning with labeled examples
  • RLHF: reinforcement learning from human feedback
  • Post-training: the umbrella category for all training after pre-training
  • Alignment: post-training a model toward a certain goal. Safety alignment = training the model to refuse harmful requests. Whenever you see articles about "alignment" on the Anthropic website, this is what they're talking about.

Go deeper:


Layer 5: Reasoning

How do you make a model think harder? There are three approaches, and they're often confused:

Chain of Thought Prompting

You explicitly tell the model: "Think step by step." The model then shows its work ("First I divide this, then I add these numbers..."). Research shows this actually improves performance significantly. The key is that the reasoning is visible to you in the output.

Reasoning Models

Instead of prompting the model to think, you train it to think. The fine-tuning data includes step-by-step reasoning as the correct answer. So for 2 + 2, the correct training answer isn't just "4." It's "First I take 2, then the other 2, then I look at the operator, and I get 4." The chain of thought is invisible, baked into the model's behavior.

Extended Thinking

Reasoning models could keep thinking forever, burning through tokens. Extended thinking sets a budget for how long the model can reason. You're trading tokens and time for (potentially) better quality. This is why enabling extended thinking in Claude blows through your token budget.

Jargon to know:

  • Chain of thought: visible step-by-step reasoning in the output
  • Reasoning model: a model explicitly fine-tuned to reason internally
  • Extended thinking: increasing a token budget (or toggle) that controls how long a model can reason

Go deeper:


Layer 6: Tool Calling

This is where things get interesting. Tools enable a smart model to do things in the real world.

The flow: user asks "What's the weather in NYC?" → the model decides to call a get_weather function → the system executes the function → the result goes back to the model → the model responds in natural language.

A critical distinction: the model doesn't actually call the tool. The model is still just a next token prediction system. It says "I think we should call this tool," and then the surrounding system (the harness) handles the actual execution, error handling, and piping results back. The model reasons about which tool to call; the harness does the work.

Jargon to know:

  • Tool calling / Function calling: same thing; giving a model access to external functions
  • API (Application Programming Interface): just another way to call a function. You hit an endpoint on the web and it does work and returns a response.

Go deeper:


Layer 7: The Agent Loop

Take a smart model, give it tools, and put it in a loop. That's an agent.

This is commonly called the ReAct loop (Reason + Action):

  1. Think — the model assesses where it is and what needs to happen next
  2. Act — it makes a tool call based on its reasoning
  3. Observe — it checks the results. Did the API return properly? Is the output well-formatted? Is the task complete?
  4. Loop — if the task isn't done, go back to step 1

It's essentially a glorified while loop. The agent keeps working, checking, and retrying until the task is complete or it hits a timeout.

Agents today (Claude Code, Cursor, GitHub Copilot, Devin, Codex) can run locally on your machine, in the cloud on a virtual machine, or in hybrid mode where you start locally and shift to the cloud. Most of them support all three at this point. You can even communicate with ongoing cloud tasks from your phone.

Jargon to know:

  • Agent: a model with tools in a loop
  • ReAct: reason and action; the standard agent loop pattern
  • Agentic: if someone puts "agentic" in front of a word, they just mean it uses this agent loop. "Agentic legal assistant" = agent loop applied to law.

Layer 8: The Agent Harness

If the model isn't calling tools directly, what is? The harness.

The harness is all the code wrapped around the model that makes an agent work. Think of Claude Code: it's a harness wrapped around API calls to the Claude model. It takes your input, combines it with a system prompt, passes it to the model, and then handles everything the model can't do on its own:

  • Validates tool calls: models hallucinate. What if the tool is called get_weather but the model says get_weather_tomorrow? The harness maps and corrects.
  • Executes tools: actually runs the function calls
  • Handles errors: retries failed calls, manages timeouts
  • Enforces guardrails: safety checks, token limits, cost tracking, authentication

You can't just put an LLM out in the wild and expect it to work correctly. You need guardrails, error checking, and mapping from what the model says to what actually happens. That's the harness.

Here's the interesting part: harnesses are interchangeable with models. Claude Code is a harness for Claude, but you could plug in a different model (like Kimi) and use the same harness. Everything (tool management, error validation, the agent loop) stays the same. You just swap the brain.

Jargon to know:

  • Harness / Scaffolding / Framework / SDK: all roughly the same thing. The code around the model that makes it an agent. Different terms, same concept.

Go deeper:


Layer 9: Multi-Agent Orchestration

One agent not enough? Run multiple agents on different tasks, potentially talking to each other.

Orchestration is like directing an orchestra: telling various agents what to do and how to work together.

The most common pattern is hub and spoke: one main agent (the orchestrator) delegates tasks to sub-agents. When you use Claude Code, you're the main thread. You ask it to do something, and it dispatches multiple agents to read the codebase, make changes, and report back.

There's also agent-to-agent coordination (agent teams), where agents skip the central orchestrator and talk directly to each other through a shared task list. This uses more tokens but can produce higher quality output from multiple perspectives.

Jargon to know:

  • Multi-agent orchestration: managing multiple agents working on multiple tasks
  • Sub-agent: an agent spawned by another agent
  • Hub and spoke: one orchestrator delegates to many workers

Go deeper:


Layer 10: Communication Protocols

How do agents talk to tools and to each other? Through open standards: the internet agreeing to do things the same way, like putting USB-C on every phone.

  • MCP (Model Context Protocol): agents communicating with tools. An MCP server exposes a set of capabilities (like a weather API), and the agent decides when to use them.
  • A2A (Agent-to-Agent Protocol): agents communicating with each other
  • Agents.md: an open standard for providing procedural instructions to agents

There's been a debate about whether MCP is "dead" because loading tool schemas into context eats tokens. An alternative is using CLI (command line interface) tools directly. Agents can invoke locally installed packages via terminal commands instead. Both approaches work; they're just different ways for agents to call tools.

Jargon to know:

  • MCP: model context protocol; how agents talk to tools
  • A2A: agent-to-agent protocol; how agents talk to each other
  • CLI: command line interface; an alternative to MCP for tool invocation

Go deeper:


Layer 11: Memory

When you close a Claude session, everything in context disappears. Memory is what persists.

  • Working memory: just another word for context; what the model can see right now
  • Episodic memory: the agent remembers conversations across sessions ("I remember we were talking about Python yesterday")
  • Semantic memory: facts from a knowledge base; your company wiki, Confluence, databases
  • Procedural memory: instructions on how to do things. This is what CLAUDE.md, agents.md, and skills files are. Just files of instructions that the agent reads every time and that don't disappear. That's what makes them memory.

RAG (Retrieval Augmented Generation) is the standard method for fetching information from memory. Large chunks of text are encoded as vectors, and when you ask a question, the system finds matching vectors, fetches the associated text, and injects it into the prompt so the agent can answer intelligently.

Jargon to know:

  • RAG: retrieval augmented generation; fetching relevant documents and injecting them into context
  • Vector database: where those encoded text chunks are stored and referenced as vectors, optimized to quickly find those similar to a query vector
  • Sandbox: an isolated environment where agents can run safely without going rogue. This is why people bought Mac Minis to run their OpenClaw assistants; they didn't trust it with full system access. Sandboxes can also be cloud-based.

Go deeper:


Layer 12: Evals and Benchmarks

How do you know if an agent is good at its job? You test it.

An eval is just a test case. You write a prompt, describe what a good response looks like, and check whether the agent delivers. But what if the task is subjective, like "respond in a friendly but informative tone"? You either have a human review it, or you use a separate model to grade the first model's work. That's an LLM judge. For tasks with clear right/wrong answers, you just write a programmatic check: does the code compile, do the tests pass, does the output match the expected format?

A benchmark is a suite of evals that's domain-specific and widely recognized. When someone says "Codex performs well on SWE-Bench," they mean it scores well on a well-known set of software engineering evals. Companies use benchmarks to market their models: "Our model's brain is bigger and smarter than yours."

Fair criticism: benchmarks can be gamed. If the eval data leaked into training data, the model has essentially seen the test. And when a company creates their own benchmark (like "Composer Bench" for Composer), the comparison isn't exactly unbiased. Good evals need to be continuously updated and ideally open-sourced.

Evals are an iterative process: start simple, watch the model make mistakes, add test cases for those mistakes, improve the prompt or model, repeat. You can check from multiple dimensions (correctness, reasoning path, ethics, formatting).

Jargon to know:

  • Eval: a test case for grading agent performance
  • Benchmark: a standardized suite of evals (e.g., SWE-Bench for coding)
  • LLM judge: using one model to grade another model's work

Go deeper:


Putting It All Together

With this mental model, you can decode any AI news. Here are a few examples from the talk:

"Kimi 2.5 provides the foundation for Cursor's Composer 2": Kimi 2.5 is a pre-trained model. Cursor did post-training / fine-tuning on top of it to build a model that's great at coding tasks. This was actually controversial because Cursor initially didn't credit Kimi.

"How do we monitor internal coding agents for misalignment?": We know what coding agents are (models + tools + loop). Misalignment means the agent is deviating from what it was trained to do. It's no longer aligned with its intended task.

"Claude Code harness is public and I can run Kimi in it for 1/120th the cost of Opus": Claude Code is a harness. You can swap out the Claude model for a cheaper open-source model (Kimi) and keep all the tool management, error handling, and agent loop infrastructure. Same car, different engine.


The Mental Model

You don't need to understand every new release. You need a framework that lets you place each one in context:

  1. Attention → what the model focuses on
  2. Transformers → stacks of attention
  3. LLMs → stacks of transformers
  4. Post-training → turning autocomplete into an assistant
  5. Reasoning → making models think harder
  6. Tool calling → giving models the ability to act
  7. Agent loop → think, act, observe, repeat
  8. Harness → the scaffolding that makes agents reliable
  9. Orchestration → multiple agents working together
  10. Protocols → how agents communicate (MCP, A2A)
  11. Memory → what persists across sessions
  12. Evals → how you test all of the above

New releases are always referring to something in one of these layers. Now you know where to place them.


Resources

No-Code Works | Learn to Build with AI