**Lecture 32: LLM APIs & the Agency Spectrum** **Total Time: 50 minutes** **[3 min] 0. Housekeeping** - Roadmap image + reading (smolagents intro, Anthropic API quickstart) **[15 min] 1. WHAT: From Models to Agents** - **AI Evolution: The Role of Language** - Deep Learning: no LLM, predefined tasks (Lec 11–25) - Foundation Model: LLM, on-the-fly tasks via prompts (Lec 26–31) - ???: LLM generates text → code!!! (Lec 32–36) - **Code #1: Control the Loop** — FM: human drives the conversation (VQA). Agent: LLM decides what to do next and when to stop. - **Code #2: Tool Use** — FM: talks about actions ("you should run rm..."). Agent: actually executes code with side effects. - **Code #3: Add Feedback** — FM: plans in one shot. Agent: executes, observes errors (e.g., OOM), adapts. - **Agency Spectrum**: FM → Router (#1) → Tool Call (#2) → Multi-step (#1+2+3) → Multi-Agent (N×) → Code Agent (∞) - How much code does the LLM control? More code → more agency. - **Agents in the Wild**: 3 examples — Coding Agent (Claude Code), Personal Assistant (scheduling), Biomedical Discovery Agent (read papers → hypothesize → experiment → analyze) - **Demo**: Play first 2 min of "Introducing Claude Code" video **[8 min] 2. WHY: Three Reasons Agents Matter** - **Why #1: Unpredictable problems need flexible solutions** — If you can draw a flowchart → hardcode. If not → agent. - **Why #2: Human-in-the-loop doesn't scale** — VQA: human at every step, 1 task at a time. Agent: one prompt, autonomous, scales to 100 scans. - **Why #3: Tools multiply intelligence** — LLM alone = brain without hands. LLM + N tools = combinatorial power. Biological parallel: language + tool use = what separates humans from animals. **[7 min] 3. HOW: Building Agents** - **What we're building**: Two columns — Coding Agent (Lec 32–35, leaked Claude Code → nano-claude-code) + Personal Agent (Lec 36, OpenClaw) - **Two "open-sourced" codebases for Claude Code**: - nano-claude-code (5 files, ~1,200 lines, working agent) — github.com/SafeRL-Lab/cheetahclaws - claw-code (66 files, architectural blueprint, 184 tools) — github.com/ultraworkers/claw-code - **Roadmap: Follow the Spectrum**: - Lec 32: FM (API) → config.py - Lec 33: Router + Tool Call + Multi-step → tools.py, agent.py, context.py, nano_claude.py - Lec 34: Multi-Agent → subagent.py + biomedical agents - Lec 35: Code Agent → claw-code architecture - Lec 36: Personal Agent → OpenClaw **[15 min] 4. Five Things About LLM APIs** - **(a) API Key: Q → A** — Simplest call: import anthropic, create client, messages.create(). Key params: model, max_tokens, system, messages, temperature, tools (Lec 33). Streaming vs non-streaming. - **(b) Tokens & Cost** — ~0.75 words/token. Cost = input + output tokens. Model pricing (sonnet $3/$15, opus $15/$75 per M tokens). Context window = memory (200K Claude, 128K GPT-4o, 1M Gemini). Agents fill context fast. - **(c) Dialog — Messages Format** — Conversation = list of role-tagged messages (system, user, assistant). Append to continue multi-turn. Same format across all providers. - **(d) Structured Output** — Problem: raw text hard to parse. Solution: prompt for JSON. Even better: tool-based schemas (Lec 33). - **(e) Vision — Images + Text** — Multimodal: content = list of image + text blocks. base64 or URL. Connects to Lec 30–31 (CLIP, LLaVA). Biomed: histology, radiology, cell counting. - **(f) In config.py** — Maps a–e to nano-claude-code. config.py (76 lines) handles a + b. Dialog/structured/vision live in agent.py and tools.py (Lec 33). **[2 min] 5. Wrap-Up** - **Today**: What (LLM → code → agent), Why (unpredictable, scale, tools), How (API basics a–e) - **Covered in nano-claude-code**: config.py - **Next Lec 33**: Tools + Agent Loop — tool schemas, while-loop, permissions, context, full REPL (tools.py, agent.py, context.py, nano_claude.py)