We've all been there. You fire up your AI coding agent, ask it to write a migration script, and it produces something that technically works but misses every convention your team actually uses. Then you ask it to review a PR and it gives you generic advice that ignores your project's architecture entirely.

The problem isn't that AI coding agents are bad. The problem is you're asking one generalist agent to be an expert at everything.

The Root Cause: One Prompt to Rule Them All

Most developers interact with AI coding tools using a single, default system prompt. Maybe you've customized it a bit — added some notes about your preferred language or framework. But fundamentally, you're sending every task through the same generic pipeline.

Think about it like this: you wouldn't ask your backend engineer to also design your icons, write your marketing copy, and configure your Kubernetes cluster. Specialization exists for a reason.

When you ask a generic agent to handle a database migration, it doesn't know:

Your team's naming conventions for migration files
Whether you prefer raw SQL or an ORM's migration builder
How your rollback strategy works
What your CI pipeline expects from migration files

So it guesses. And guessing means you spend 20 minutes fixing what was supposed to save you time.

The Fix: Specialized Subagents

The solution is to break your monolithic agent into specialized subagents — each one tuned for a specific development task with its own system prompt, constraints, and context.

The concept has been gaining traction in the open-source community, with curated collections of 100+ specialized agent configurations now available on GitHub. The idea is simple: instead of one agent that's mediocre at everything, you maintain a library of agents that are genuinely good at specific things.

Here's what a basic subagent configuration looks like:

yaml

# agents/code-reviewer.md
---
name: code-reviewer
description: Reviews pull requests with focus on security and performance
---

You are a senior code reviewer. When reviewing code:

1. Check for common security vulnerabilities (SQL injection, XSS, SSRF)
2. Flag any N+1 query patterns in database access code
3. Verify error handling covers edge cases
4. Ensure new code follows existing patterns in the codebase

Do NOT suggest stylistic changes — the linter handles that.
Do NOT rewrite working code just because you'd write it differently.
Focus only on bugs, security issues, and performance problems.

Notice how specific that is. It tells the agent what to focus on and what to ignore. That "Do NOT suggest stylistic changes" line alone saves you from sifting through 30 nitpicks about bracket placement.

Building Your Own Subagent Library

Here's the approach I've been using across a few projects. Start with the tasks where a generic agent frustrates you most.

Step 1: Identify Your Pain Points

List out the tasks where your AI agent consistently produces mediocre output. For most teams, it's:

Database migrations
Test generation (especially integration tests)
Code review
Documentation
Debugging specific frameworks
CI/CD configuration

Step 2: Write Focused System Prompts

For each pain point, write a system prompt that captures your team's actual practices. Be brutally specific.

markdown

# agents/test-writer.md
---
name: test-writer
description: Generates tests following project conventions
---

You write tests for a Node.js project using Vitest.

Rules:
- Use `describe` blocks grouped by method name
- Each test name starts with "should" followed by expected behavior
- Use `beforeEach` for shared setup, never duplicate setup across tests
- Mock external HTTP calls with msw, never with manual jest.fn() stubs
- For database tests, use the test transaction wrapper from `src/test/helpers.ts`
- Always test the error path, not just the happy path
- Aim for one assertion per test — split if a test checks multiple behaviors

Example test structure:

typescript import { describe, it, expect, beforeEach } from 'vitest' import { createUser } from '../services/user' import { withTestTransaction } from '../test/helpers'

describe('createUser', () => {
// Wraps each test in a transaction that rolls back after
const db = withTestTransaction()

it('should create a user with valid input', async () => {
const user = await createUser(db, {
email: 'test@example.com',
name: 'Test User',
})
expect(user.id).toBeDefined()
})

it('should throw on duplicate email', async () => {
await createUser(db, { email: 'dupe@test.com', name: 'First' })
await expect(
createUser(db, { email: 'dupe@test.com', name: 'Second' })
).rejects.toThrow('Email already exists')
})
})

text


That system prompt encodes decisions your team has already made. The agent doesn't need to figure out your testing philosophy from scratch every time.

### Step 3: Organize and Invoke

Keep your subagent configs in a directory structure that makes sense:

.agents/
├── code-review/
│ ├── security-reviewer.md
│ └── performance-reviewer.md
├── testing/
│ ├── unit-test-writer.md
│ └── integration-test-writer.md
├── migrations/
│ └── sql-migration-writer.md
└── docs/
└── api-doc-writer.md

text


Most modern AI coding tools — including open-source CLI agents — support loading custom agent configurations from markdown files. When you invoke a task, you point it at the relevant subagent instead of the default.

With tools like OpenAI's Codex CLI, you can reference these directly:

bash

Instead of a generic request:

codex "write tests for the user service"

Point to your specialized subagent:

codex --agent .agents/testing/unit-test-writer.md "write tests for the user service" ```

The difference in output quality is night and day.

Why This Actually Works

Three reasons:

Reduced ambiguity. The agent doesn't have to infer your conventions — you've told it explicitly. Less guessing means fewer mistakes.

Focused context window. A specialized prompt doesn't waste tokens on irrelevant instructions. Your test-writer agent doesn't need to know about your deployment strategy.

Composability. You can chain subagents together. Run the migration writer, then pipe the output to the code reviewer. Each agent does one thing well.

Prevention: Stop the Drift

The biggest risk with subagents is letting them go stale. Your conventions evolve, but if your agent configs don't keep up, you're back to square one.

A few things that help:

Version control your agents alongside your code. They should live in the repo, not in someone's personal config. When someone changes a convention, they update the relevant agent file in the same PR.
Review agent output periodically. Every couple of weeks, spot-check what your subagents are producing. If you notice recurring issues, update the prompt.
Start small. You don't need 130 subagents on day one. Start with 3-5 for your most painful tasks. Add more as you identify gaps. A small set of well-maintained agents beats a huge collection of stale ones.
Share across teams. If your org has multiple repos with similar conventions, extract common agent configs into a shared package. This is where open-source collections really shine — they give you solid starting points that you can customize.

The Takeaway

Generic AI agents produce generic output. That's not a bug in the model — it's a bug in how we use it. The fix isn't waiting for a smarter model. It's giving the current model better instructions for each specific job.

I've been running specialized subagents on two production projects for the past couple of months, and the quality improvement was immediate. The test-writer agent alone probably saves me 30 minutes a day in cleanup and revision.

Grab a curated collection from GitHub as a starting point, strip out what you don't need, customize what you keep, and commit the configs to your repo. Your future self will thank you.