Every project hits this wall eventually. Someone new joins, or a non-technical stakeholder wants to understand how the system works, and you spend three hours on a Zoom call drawing boxes and arrows on a whiteboard. Then they ask the same questions next week.
I've been on both sides of this. I've been the senior dev mumbling "well, it's complicated" while scrolling through a monorepo, and I've been the new hire staring at a src/ folder with 200 files wondering where anything actually starts.
The real problem isn't that codebases are complex. It's that we have zero tooling for making them teachable.
Why Traditional Documentation Fails
Let's be honest about what usually happens with onboarding docs:
- Someone writes a
CONTRIBUTING.mdthat's outdated within a month - Architecture docs live in Notion and reference files that got renamed in Q2
- README files explain how to install, not how to understand
- Inline comments explain what the code does, not why it's structured that way
The root cause is a mismatch between how documentation is written (flat, linear, static) and how people actually learn codebases (exploratory, layered, interactive). You don't read a codebase top-to-bottom like a novel. You jump around, build mental models, and circle back.
The "Codebase as Curriculum" Approach
I recently came across an interesting approach to this problem: treating your entire codebase as source material for generating an interactive course. The idea is that an AI reads through your code, understands the structure and relationships, and produces a single-page HTML document that walks someone through the project like a guided tutorial.
The open-source project codebase-to-course does exactly this. It's a Claude Code skill — meaning it plugs directly into your development workflow — and it analyzes your repo to produce a self-contained, interactive HTML page.
Here's what the setup looks like:
# Add the skill to your Claude Code environment
# Navigate to your project directory first
cd your-project
# Install the skill (it integrates as a Claude Code slash command)
# Check the repo README for the latest installation method
git clone https://github.com/zarazhangrui/codebase-to-course.gitOnce installed, you point it at your codebase, and it generates a course that breaks your project into digestible modules. Think of it like an AI teaching assistant that's read every file in your repo and can explain the architecture to a newcomer.
The Real Problem It Solves: Vibe Coders
Here's where it gets interesting. The target audience isn't just junior developers. It's what people are calling "vibe coders" — folks who use AI tools to build software without deep programming knowledge. They can prompt their way to a working app, but when something breaks or they need to modify generated code, they're lost.
This is a real and growing problem. I've seen it firsthand: someone builds an entire Next.js app with AI assistance, then has no idea why their API route returns a 500 because they don't understand the request lifecycle.
A generated course bridges that gap:
## Example course module structure (conceptual)
Module 1: Project Overview
→ What this app does (plain English)
→ Key technologies and why they were chosen
Module 2: Entry Points
→ Where the app starts executing
→ How requests flow through the system
Module 3: Data Layer
→ How data is stored and retrieved
→ Database schema walkthrough
Module 4: Business Logic
→ Core functions and their relationships
→ Decision points and edge casesThe single-page HTML format is a smart choice. No build step, no dependencies, no server. You can email it, drop it in Slack, or host it as a static file. It just works.
Building Your Own Lightweight Version
Even if you don't use the tool directly, the pattern is worth stealing. You can build a simpler version of this workflow with any LLM that supports large context windows:
import os
import pathlib
def collect_codebase(root_dir, extensions=None):
"""Gather all relevant source files into a single context string."""
if extensions is None:
extensions = {'.py', '.js', '.ts', '.jsx', '.tsx', '.go', '.rs'}
files = []
for path in pathlib.Path(root_dir).rglob('*'):
if path.suffix in extensions and 'node_modules' not in str(path):
try:
content = path.read_text(encoding='utf-8')
# Include the file path as context for the AI
files.append(f"--- FILE: {path.relative_to(root_dir)} ---\n{content}")
except (UnicodeDecodeError, PermissionError):
continue # Skip binary or protected files
return '\n\n'.join(files)
def generate_course_prompt(codebase_text):
"""Build a prompt that asks the AI to create a teaching document."""
return f"""Analyze this codebase and create an interactive HTML course
that teaches a non-technical person how the project works.
Structure it as progressive modules, starting from high-level
architecture down to specific implementation details.
Include:
- Visual diagrams using simple HTML/CSS (no external deps)
- Code snippets with annotations
- "Try to understand" exercises
- A clickable table of contents
Output a single, self-contained HTML file.
CODEBASE:
{codebase_text}"""The key insight is collecting files with their paths intact. The directory structure is documentation — it tells the AI how the developer organized their thinking.
What Actually Makes This Work
Three things make the codebase-to-course pattern effective:
Progressive disclosure. Instead of dumping everything at once, a good generated course reveals complexity gradually. Start with "this is a web app that does X" and work down to "here's how the rate limiter handles burst traffic." Interactive navigation. A single HTML page with collapsible sections and a sidebar lets learners explore at their own pace. This is fundamentally different from scrolling through a static doc. Code in context. Rather than showing isolated snippets, the course presents code alongside explanations of why it exists and how it connects to other parts of the system. This is the thing that READMEs almost never do.Preventing the Onboarding Problem
The best fix is prevention. A few habits that keep codebases teachable:
- Name things for newcomers, not for yourself.
handleUserAuthenticationFlowbeatsdoAuth. Yes, it's longer. Your future teammates will thank you. - Keep your entry points obvious. If someone can't find where your app starts within 30 seconds of opening the repo, your structure needs work.
- Write ADRs (Architecture Decision Records). A short markdown file explaining why you chose Postgres over MongoDB, or why the auth service is separate, saves hours of explanation later.
- Generate courses as part of CI. If you adopt a tool like codebase-to-course, run it on every release. Keep the output alongside your docs. Stale courses are better than no courses, and automated ones stay fresher.
# Example: generate course on release (GitHub Actions)
name: Generate Course
on:
release:
types: [published]
jobs:
build-course:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Generate interactive course
run: |
# Run your course generation tool here
# Output goes to docs/course.html
echo "Course generation step"
- name: Upload as release asset
uses: softprops/action-gh-release@v1
with:
files: docs/course.htmlThe Bigger Picture
We're entering an era where a lot of code is written by people who don't fully understand it. That's not a criticism — it's just the reality of AI-assisted development. The tooling needs to catch up.
Turning codebases into interactive learning materials isn't just a nice-to-have anymore. It's becoming essential infrastructure for teams that include vibe coders, non-technical founders, or anyone who needs to understand what the software actually does without reading every line.
The codebase-to-course pattern — whether you use the open-source tool or roll your own — is one of the more practical solutions I've seen to a problem that's only getting bigger.
