No-Code Works | Learn to Build with AI

Download slides (PDF) →

This talk was given at EmpireHacks 2026, Cornell Tech's hackathon. There's a problem in the AI landscape today: there's just too much going on. An endless stream of jargon, LinkedIn hype posts, and Twitter threads that make it genuinely difficult to make sense of what's happening. The goal of this talk is to give you a mental model, built from first principles and layer by layer, so that you can parse any new release, any new buzzword, and immediately understand where it fits.

We start from the smallest possible building block and work our way up to multi-agent systems. Think of it as a layer cake: each layer builds on the one below it.

Layer 1: Attention

Attention is the foundation of everything. It's a piece of math that tells the model which words are most relevant when predicting the next word.

Large language models are fundamentally next token prediction systems. But to predict the right next token, the model needs to know which earlier words matter most. In "The cat sat on the mat," the word "cat" is far more important than "the." "The" appears everywhere and carries little meaning. The attention mechanism is what helps the model learn this kind of context.

Instead of naively predicting the next word based on just the previous few words, attention lets the model say: "This part of the sentence, whether it was the last sentence or ten sentences ago, is especially relevant to what I'm about to predict."

Jargon to know:

Attention: the mechanism that identifies which tokens matter for the current prediction
Weights / Parameters: the learnable variables inside a model that capture what it has learned during training. When someone says a model has "open weights," they mean the trained model is free to download and use.
Tokens: the smallest chunks a language model works with. You can think of a token as roughly a word, though words can be broken into sub-word pieces (e.g., "presentation" might become "present" + "ation")

Layer 2: Transformers

Take a bunch of attention mechanisms and stack them together with a feed-forward network, and you get a transformer block.

More attention heads means more weights, which means a bigger "brain." The model can capture more information and understand more context about what came before it. That's really all a transformer is: attention blocks stacked together.

Jargon to know:

Transformer: a stack of attention mechanisms; the architecture behind every modern language model
Parameters: used interchangeably with weights in casual conversation. "How many parameters does the model have?" is just asking how big its brain is.
Neural network: a type of machine learning model; transformers are a specific kind of neural network

Go deeper:

"Attention Is All You Need" — the original paper that started it all
Transformer Explainer — interactive visualization of how transformers work

Layer 3: Large Language Models (LLMs)

Take multiple transformer blocks and stack them on top of each other. It just gets bigger and bigger: more learnable parameters, more capacity.

The scale matters:

Millions of parameters → can complete sentences reasonably well
Billions of parameters → can write coherent paragraphs
Hundreds of billions+ → the large language models we use today

Every time you see a new model release ("GPT-5.4 with 7 billion parameters" or "Mistral 7B"), that number is just telling you the size of the model's brain. It's a company's way of saying: look how big our model is, look how much time we spent training it.

The LLM landscape today includes OpenAI (GPT), Anthropic (Claude), Google (Gemini), Meta (Llama, open source), DeepSeek (open source), Mistral, and Grok (xAI).

Jargon to know:

LLM: a very large transformer-based language model
Pre-training: training the model to do next token prediction really well. It's called "pre" because there are more training phases after this one.
Context window: how many tokens the model can process at once. Claude has 200K and 1M token context window models. When you hit the limit, you need to start a new session, clear context, or compact your conversation.
Inference: just running a model to predict the next token. That's it. "We're building GPUs for inference" = "We're building GPUs to run models." Don't be intimidated by this word. It's fancy VC lingo at this point.

Layer 4: Post-Training

A model that's great at next token prediction isn't necessarily a great assistant. It's just autocomplete. Post-training is what turns an LLM into something helpful.

Supervised Fine-Tuning (SFT)

You show the model labeled examples of what a good assistant response looks like. This is where the model learns to format responses with a helpful tone, like the "Sure, absolutely!" before answering your question. When someone says "the difference between Claude and ChatGPT is probably just the fine-tuning," this is what they mean: different training data produces different personalities.

Reinforcement Learning from Human Feedback (RLHF)

Humans are shown two model responses and asked which one they prefer. That preference signal gets fed back into the model, steering it toward better responses.

You may have seen this yourself. Apps like Granola or ChatGPT occasionally ask "Which response do you prefer, A or B?" That's them collecting feedback for their RL pipeline.

Getting quality human-labeled data is a genuine problem. Companies pay for expert annotators, but data quality varies widely. There's no perfect solution, and it's something every company is constantly working on.

Jargon to know:

Fine-tuning: additional training to specialize a model for a particular task or behavior
Supervised fine-tuning: fine-tuning with labeled examples
RLHF: reinforcement learning from human feedback
Post-training: the umbrella category for all training after pre-training
Alignment: post-training a model toward a certain goal. Safety alignment = training the model to refuse harmful requests. Whenever you see articles about "alignment" on the Anthropic website, this is what they're talking about.

Go deeper:

Hugging Face — the hub for open-source models, datasets, and fine-tuning
OpenAI Fine-tuning Guide

Layer 5: Reasoning

How do you make a model think harder? There are three approaches, and they're often confused:

Chain of Thought Prompting

You explicitly tell the model: "Think step by step." The model then shows its work ("First I divide this, then I add these numbers..."). Research shows this actually improves performance significantly. The key is that the reasoning is visible to you in the output.

Reasoning Models

Instead of prompting the model to think, you train it to think. The fine-tuning data includes step-by-step reasoning as the correct answer. So for 2 + 2, the correct training answer isn't just "4." It's "First I take 2, then the other 2, then I look at the operator, and I get 4." The chain of thought is invisible, baked into the model's behavior.

Extended Thinking

Reasoning models could keep thinking forever, burning through tokens. Extended thinking sets a budget for how long the model can reason. You're trading tokens and time for (potentially) better quality. This is why enabling extended thinking in Claude blows through your token budget.

Jargon to know:

Chain of thought: visible step-by-step reasoning in the output
Reasoning model: a model explicitly fine-tuned to reason internally
Extended thinking: increasing a token budget (or toggle) that controls how long a model can reason

Go deeper:

DeepSeek R1 — open-source reasoning model
OpenAI o3-mini — One of OpenAI's reasoning models
Claude Extended Thinking — Anthropic's implementation

Layer 6: Tool Calling

This is where things get interesting. Tools enable a smart model to do things in the real world.

The flow: user asks "What's the weather in NYC?" → the model decides to call a get_weather function → the system executes the function → the result goes back to the model → the model responds in natural language.

A critical distinction: the model doesn't actually call the tool. The model is still just a next token prediction system. It says "I think we should call this tool," and then the surrounding system (the harness) handles the actual execution, error handling, and piping results back. The model reasons about which tool to call; the harness does the work.

Jargon to know:

Tool calling / Function calling: same thing; giving a model access to external functions
API (Application Programming Interface): just another way to call a function. You hit an endpoint on the web and it does work and returns a response.

Go deeper:

Layer 7: The Agent Loop

Take a smart model, give it tools, and put it in a loop. That's an agent.

This is commonly called the ReAct loop (Reason + Action):

Think — the model assesses where it is and what needs to happen next
Act — it makes a tool call based on its reasoning
Observe — it checks the results. Did the API return properly? Is the output well-formatted? Is the task complete?
Loop — if the task isn't done, go back to step 1

It's essentially a glorified while loop. The agent keeps working, checking, and retrying until the task is complete or it hits a timeout.

Agents today (Claude Code, Cursor, GitHub Copilot, Devin, Codex) can run locally on your machine, in the cloud on a virtual machine, or in hybrid mode where you start locally and shift to the cloud. Most of them support all three at this point. You can even communicate with ongoing cloud tasks from your phone.

Jargon to know:

Agent: a model with tools in a loop
ReAct: reason and action; the standard agent loop pattern
Agentic: if someone puts "agentic" in front of a word, they just mean it uses this agent loop. "Agentic legal assistant" = agent loop applied to law.

Layer 8: The Agent Harness

If the model isn't calling tools directly, what is? The harness.

The harness is all the code wrapped around the model that makes an agent work. Think of Claude Code: it's a harness wrapped around API calls to the Claude model. It takes your input, combines it with a system prompt, passes it to the model, and then handles everything the model can't do on its own:

Validates tool calls: models hallucinate. What if the tool is called get_weather but the model says get_weather_tomorrow? The harness maps and corrects.
Executes tools: actually runs the function calls
Handles errors: retries failed calls, manages timeouts
Enforces guardrails: safety checks, token limits, cost tracking, authentication

You can't just put an LLM out in the wild and expect it to work correctly. You need guardrails, error checking, and mapping from what the model says to what actually happens. That's the harness.

Here's the interesting part: harnesses are interchangeable with models. Claude Code is a harness for Claude, but you could plug in a different model (like Kimi) and use the same harness. Everything (tool management, error validation, the agent loop) stays the same. You just swap the brain.

Jargon to know:

Harness / Scaffolding / Framework / SDK: all roughly the same thing. The code around the model that makes it an agent. Different terms, same concept.

Go deeper:

"Anatomy of an AI Agent" — survey paper on agent architecture
Claude Agent SDK
OpenAI Agents SDK
LangChain · CrewAI · AutoGen — popular agent frameworks

Layer 9: Multi-Agent Orchestration

One agent not enough? Run multiple agents on different tasks, potentially talking to each other.

Orchestration is like directing an orchestra: telling various agents what to do and how to work together.

The most common pattern is hub and spoke: one main agent (the orchestrator) delegates tasks to sub-agents. When you use Claude Code, you're the main thread. You ask it to do something, and it dispatches multiple agents to read the codebase, make changes, and report back.

There's also agent-to-agent coordination (agent teams), where agents skip the central orchestrator and talk directly to each other through a shared task list. This uses more tokens but can produce higher quality output from multiple perspectives.

Jargon to know:

Multi-agent orchestration: managing multiple agents working on multiple tasks
Sub-agent: an agent spawned by another agent
Hub and spoke: one orchestrator delegates to many workers

Go deeper:

LangGraph — framework for multi-agent orchestration
CrewAI · n8n · AutoGen — orchestration tools

Layer 10: Communication Protocols

How do agents talk to tools and to each other? Through open standards: the internet agreeing to do things the same way, like putting USB-C on every phone.

MCP (Model Context Protocol): agents communicating with tools. An MCP server exposes a set of capabilities (like a weather API), and the agent decides when to use them.
A2A (Agent-to-Agent Protocol): agents communicating with each other
Agents.md: an open standard for providing procedural instructions to agents

There's been a debate about whether MCP is "dead" because loading tool schemas into context eats tokens. An alternative is using CLI (command line interface) tools directly. Agents can invoke locally installed packages via terminal commands instead. Both approaches work; they're just different ways for agents to call tools.

Jargon to know:

MCP: model context protocol; how agents talk to tools
A2A: agent-to-agent protocol; how agents talk to each other
CLI: command line interface; an alternative to MCP for tool invocation

Go deeper:

Model Context Protocol — the MCP specification
Google A2A Protocol
AGENTS.md — open standard for agent instructions
Linux Foundation Agentic AI Initiative
"MCP vs CLI Is the Wrong Fight" (Smithery) · "MCP vs CLI Use" (ScaleKit)

Layer 11: Memory

When you close a Claude session, everything in context disappears. Memory is what persists.

Working memory: just another word for context; what the model can see right now
Episodic memory: the agent remembers conversations across sessions ("I remember we were talking about Python yesterday")
Semantic memory: facts from a knowledge base; your company wiki, Confluence, databases
Procedural memory: instructions on how to do things. This is what CLAUDE.md, agents.md, and skills files are. Just files of instructions that the agent reads every time and that don't disappear. That's what makes them memory.

RAG (Retrieval Augmented Generation) is the standard method for fetching information from memory. Large chunks of text are encoded as vectors, and when you ask a question, the system finds matching vectors, fetches the associated text, and injects it into the prompt so the agent can answer intelligently.

Jargon to know:

RAG: retrieval augmented generation; fetching relevant documents and injecting them into context
Vector database: where those encoded text chunks are stored and referenced as vectors, optimized to quickly find those similar to a query vector
Sandbox: an isolated environment where agents can run safely without going rogue. This is why people bought Mac Minis to run their OpenClaw assistants; they didn't trust it with full system access. Sandboxes can also be cloud-based.

Go deeper:

Anthropic Memory · ChatGPT Memory
Mem0 — memory layer for AI apps
MemoryPalace paper — research on agent memory architectures
Pinecone — vector database for RAG
Pinecone RAG Guide · AWS RAG Explainer
OpenClaw · Ralph Loop — on local vs cloud agent tradeoffs

Layer 12: Evals and Benchmarks

How do you know if an agent is good at its job? You test it.

An eval is just a test case. You write a prompt, describe what a good response looks like, and check whether the agent delivers. But what if the task is subjective, like "respond in a friendly but informative tone"? You either have a human review it, or you use a separate model to grade the first model's work. That's an LLM judge. For tasks with clear right/wrong answers, you just write a programmatic check: does the code compile, do the tests pass, does the output match the expected format?

A benchmark is a suite of evals that's domain-specific and widely recognized. When someone says "Codex performs well on SWE-Bench," they mean it scores well on a well-known set of software engineering evals. Companies use benchmarks to market their models: "Our model's brain is bigger and smarter than yours."

Fair criticism: benchmarks can be gamed. If the eval data leaked into training data, the model has essentially seen the test. And when a company creates their own benchmark (like "Composer Bench" for Composer), the comparison isn't exactly unbiased. Good evals need to be continuously updated and ideally open-sourced.

Evals are an iterative process: start simple, watch the model make mistakes, add test cases for those mistakes, improve the prompt or model, repeat. You can check from multiple dimensions (correctness, reasoning path, ethics, formatting).

Jargon to know:

Eval: a test case for grading agent performance
Benchmark: a standardized suite of evals (e.g., SWE-Bench for coding)
LLM judge: using one model to grade another model's work

Go deeper:

Anthropic: "Demystifying Evals"
SWE-bench — the widely used coding benchmark
Braintrust · LangSmith — eval platforms

Putting It All Together

With this mental model, you can decode any AI news. Here are a few examples from the talk:

"Kimi 2.5 provides the foundation for Cursor's Composer 2": Kimi 2.5 is a pre-trained model. Cursor did post-training / fine-tuning on top of it to build a model that's great at coding tasks. This was actually controversial because Cursor initially didn't credit Kimi.

"How do we monitor internal coding agents for misalignment?": We know what coding agents are (models + tools + loop). Misalignment means the agent is deviating from what it was trained to do. It's no longer aligned with its intended task.

"Claude Code harness is public and I can run Kimi in it for 1/120th the cost of Opus": Claude Code is a harness. You can swap out the Claude model for a cheaper open-source model (Kimi) and keep all the tool management, error handling, and agent loop infrastructure. Same car, different engine.

The Mental Model

You don't need to understand every new release. You need a framework that lets you place each one in context:

Attention → what the model focuses on
Transformers → stacks of attention
LLMs → stacks of transformers
Post-training → turning autocomplete into an assistant
Reasoning → making models think harder
Tool calling → giving models the ability to act
Agent loop → think, act, observe, repeat
Harness → the scaffolding that makes agents reliable
Orchestration → multiple agents working together
Protocols → how agents communicate (MCP, A2A)
Memory → what persists across sessions
Evals → how you test all of the above

New releases are always referring to something in one of these layers. Now you know where to place them.

Resources

YouTube: AI Agents from First Principles — the full talk
Download slides (PDF)
EmpireHacks — Cornell Tech's hackathon
Previous Workshop: Create Your Own AI Org
Previous Workshop: Lead Your Own AI Engineering Team
Join the No-Code Works Community

AI Agents from First Principles

Layer 1: Attention

Layer 2: Transformers

Layer 3: Large Language Models (LLMs)

Layer 4: Post-Training

Supervised Fine-Tuning (SFT)

Reinforcement Learning from Human Feedback (RLHF)

Layer 5: Reasoning

Chain of Thought Prompting

Reasoning Models

Extended Thinking

Layer 6: Tool Calling

Layer 7: The Agent Loop

Layer 8: The Agent Harness

Layer 9: Multi-Agent Orchestration

Layer 10: Communication Protocols

Layer 11: Memory

Layer 12: Evals and Benchmarks

Putting It All Together

The Mental Model

Resources

Create Your Own AI Org with Claude Code