Technical Writing for Developers: Docs, ADRs, Runbooks, and RFCs That Get Read

Good documentation is a force multiplier. A well-written runbook reduces a 4 AM incident from 90 minutes to 15. A clear ADR prevents relitigating the same architectural debate six months later. An accurate API reference cuts support requests for your internal platform in half.

Bad documentation — incomplete, outdated, or so vague it could describe anything — is worse than no documentation. Engineers learn to ignore it, and the cost is paid in onboarding time, repeated mistakes, and institutional knowledge locked in individuals' heads.

This guide covers the four document types that provide the most value for engineering teams, with templates and real examples.

The Documentation Hierarchy

Different documentation serves different readers at different times:

Type	Reader	When	Goal
README	Anyone new	First contact	Orient, not overwhelm
API Reference	Integrating developers	During development	Accurate, searchable, complete
ADR	Future team members	When revisiting a decision	Understand why, not just what
Runbook	On-call engineer	4 AM incident	Execute under pressure
RFC	Engineering team	Before major changes	Alignment before commitment
Architecture docs	Team + stakeholders	Onboarding, planning	Big picture understanding

The mistake most teams make: writing documentation for the wrong reader. An ADR written for a technical reader landing in the customer knowledge base. A README that explains the architecture but skips "how to run this locally."

1. READMEs That Work

A README's job is to get a new contributor productive, not to document everything. It has one job: get someone from zero to running in under 20 minutes.

README template:

# Service Name

One sentence description of what this does and who uses it.

💼 In 2026, AI Handles What Used to Take a Full Team

Lead qualification, customer support, data entry, report generation, email responses — AI agents now do all of this automatically. We build and deploy them for your business.

AI agents that qualify leads while you sleep
Automated customer support that resolves 70%+ of tickets
Internal workflow automation — save 15+ hours/week
Integrates with your CRM, email, Slack, and ERP

See What AI Can Automate WhatsApp

Prerequisites

Node.js 20+
PostgreSQL 16
Redis 7

Setup

# Clone and install
git clone https://github.com/org/service-name
cd service-name
npm install

# Configure environment
cp .env.example .env
# Edit .env — see .env.example for required values

# Run database migrations
npm run db:migrate

# Start development server
npm run dev

The app runs at http://localhost:3000

🎯 One Senior Tech Team for Everything

Instead of managing 5 freelancers across 3 timezones, work with one accountable team that covers product development, AI, cloud, and ongoing support.

Web apps, AI agents, trading systems, SaaS platforms
100+ projects delivered — 5.0 star Upwork record
Fractional CTO advisory available for funded startups
Free 30-min no-pitch consultation

Book a Free Consultation WhatsApp

Testing

npm test              # Unit tests
npm run test:integration  # Integration tests (requires postgres + redis)
npm run test:e2e          # End-to-end (requires running app)

Key Concepts

Brief explanation of the 2-3 things a new contributor needs to understand to work on this. Not the whole architecture — just what makes this service different from a standard REST API.

Directory Structure

src/
  routes/     API route handlers
  services/   Business logic
  models/     Database models (Prisma)
  lib/        Shared utilities
  workers/    Background job processors

Deployment

Link to deployment runbook or describe the deploy command.

Who to Ask

Architecture questions: @alice
On-call rotation: #on-call Slack channel
Incident history: PagerDuty dashboard (link)


Keep it short. If the README is the right place for something, it belongs in the README. If it's more than 300 lines, it's a wiki article, not a README.

---

## 2. Architecture Decision Records (ADRs)

An ADR captures a significant technical decision: what you chose, why, and what you rejected. The value is in the "why" — future engineers need to understand the context that led to a choice before they can evaluate whether it still makes sense.

**ADR template** (based on Michael Nygard's format):

```markdown
# ADR-0012: Use PostgreSQL for primary storage instead of DynamoDB

**Date**: 2026-02-15  
**Status**: Accepted  
**Deciders**: Alice Chen (Tech Lead), Bob Patel (Backend), Carol Smith (CTO)

Context

We're building the core transaction ledger for our payments platform. We need to decide on the primary data store before the team scales.

Our initial spike used DynamoDB (the team has AWS experience and it scales horizontally). After 3 weeks of development, we hit friction:

Multi-item transactions require client-side transaction logic
Complex queries for the reconciliation reports required full table scans
The team spent more time fighting DynamoDB data modeling than building features

The payments domain requires ACID guarantees for transfers, and our analytical queries (reconciliation, reporting) are fundamentally relational.

Decision

We will use PostgreSQL (via RDS Aurora) as the primary data store.

Rationale

ACID transactions: Multi-step money movement (debit account A, credit account B, insert transaction record) must be atomic. PostgreSQL handles this natively.
Query complexity: Our reconciliation queries join 4–5 tables with aggregations. These are natural in SQL; painful in NoSQL.
Team expertise: 4 of 5 engineers have production PostgreSQL experience. Zero have production DynamoDB experience.
Horizontal scaling is not our bottleneck: At our projected scale (10M transactions/year), a single RDS instance handles the write load. We can add read replicas if needed.

Alternatives Considered

DynamoDB: Rejected. ACID transactions are cumbersome, analytical queries are slow and expensive, and the team's productivity was measurably lower. Revisit if we reach a write scale that PostgreSQL can't handle.

MongoDB: Rejected. Same issues with transactions and analytical queries. Better fit for document-centric data, not relational transactional data.

CockroachDB: Interesting for global distribution, but adds operational complexity we don't need at our current scale. Revisit at 50+ engineers.

Consequences

Positive:

Team can move fast in a familiar environment
ACID guarantees simplify application code
Rich query support for reporting without a separate analytics system

Negative:

Horizontal write scaling requires sharding or Citus if we outgrow Aurora's write capacity (unlikely in the next 2 years)
We're tied to Aurora's managed cost model ($200–2,000/mo at our scale)

Review Trigger

Revisit this decision if write throughput exceeds 50,000 TPS sustained or if Aurora costs exceed $5,000/month.


**Where to store ADRs**: `docs/decisions/` in the repository. Number them sequentially (`ADR-0001`, `ADR-0002`). Keep them in source control alongside the code they document.

---

3. Runbooks

A runbook is an operational procedure for a specific, recurring scenario — most critically, incident response. The audience is an on-call engineer at 3 AM who may be sleep-deprived and stressed.

Runbook requirements:

Specific, not general ("restart the API pods" not "investigate the API")
Observable — tell the engineer what healthy looks like after each step
Decision trees for non-obvious paths
Links to dashboards, logs, and escalation contacts
Written so a new team member on their first on-call can follow it

Runbook template:

# Runbook: High Payment Failure Rate

**Alert**: `PaymentFailureRate > 5%` for 5 minutes  
**Severity**: P1 (customer-facing revenue impact)  
**On-call channel**: #incidents  
**Escalation**: @payments-lead → @cto

---

## 1. Assess Scope

Check: [Grafana dashboard — Payment Errors](https://grafana.internal/payments)

Is the failure rate:
- **Spiking sharply (< 5 min)** → likely a deploy or Stripe outage, go to §2
- **Gradual increase (> 30 min)** → likely a resource issue, go to §3
- **Constant since a specific time** → correlate with deploy history, go to §4

---

2. Check Stripe Status

Open: https://status.stripe.com

If Stripe has an active incident:

Post in #incidents: "Stripe incident in progress: [link to Stripe status]"
Enable payment retry banner in admin: https://admin.internal/settings
Notify @payments-lead
Wait for Stripe resolution — no action required on our side
Monitor dashboard until failure rate drops below 1%

If Stripe is healthy, continue to §3.

3. Check Service Health

Run in order:

# Check payment service pod status
kubectl get pods -n production -l app=payment-service

# Check recent error logs (last 100)
kubectl logs -n production -l app=payment-service --tail=100 | grep ERROR

# Check database connections
kubectl exec -n production deployment/payment-service -- \
  node -e "require('./lib/db').checkConnections().then(console.log)"

If pods are CrashLoopBackOff: → Go to Runbook: Pod crash recovery

If database connections are exhausted: → Go to Runbook: Database connection pool exhaustion

If logs show Invalid API key: → Stripe API key rotation required. Go to §5.

4. Correlate with Recent Deploys

Check deploy history:

kubectl rollout history deployment/payment-service -n production

If a deploy occurred in the last 30 minutes:

# Rollback to previous version
kubectl rollout undo deployment/payment-service -n production

# Monitor for 5 minutes
watch kubectl get pods -n production -l app=payment-service

Verify failure rate drops in Grafana. If it does, file an incident report and notify the deploy author.

5. Rotate Stripe API Key

⚠️ Do not rotate the key without confirming the current key is compromised or expired.

Generate new restricted key in Stripe dashboard → Developers → API Keys
Add to AWS Secrets Manager: production/payment-service/stripe-key

Trigger pod restart:

kubectl rollout restart deployment/payment-service -n production

Verify pods restart successfully and failure rate drops

Post-Incident

Within 24 hours:

Write incident timeline in #incidents thread
Create follow-up tickets for any identified gaps
Update this runbook if steps were unclear or missing


---

## 4. RFCs (Request for Comments)

An RFC is a proposal for a significant technical change, shared before implementation begins. Its purpose is alignment — catching objections and alternatives before engineering time is spent, not after.

**RFC template:**

```markdown
# RFC: Replace BullMQ with Temporal for job orchestration

**Author**: Alice Chen  
**Date**: 2026-03-01  
**Status**: Draft → Review → Accepted / Rejected  
**Discussion**: #eng-architecture Slack thread  
**Review deadline**: 2026-03-14

---

Problem

Our current BullMQ-based job system handles 50K jobs/day. Three recurring issues:

Retry logic is duplicated across 12 job types — each implements its own backoff
Long-running jobs fail invisibly when the worker pod restarts mid-execution
Job visibility is poor — we have no way to trace why a specific order's notification failed without reading raw Redis data

These issues have caused 3 customer-visible incidents in Q1 2026.

Proposed Solution

Replace BullMQ with Temporal.io for job orchestration.

Temporal provides:

Durable execution (jobs survive worker restarts)
Built-in retry with configurable backoff per step
Full execution history for debugging
Workflow composition (jobs that spawn sub-jobs)

Migration Plan

Stand up Temporal cluster on ECS (2 weeks)
Migrate email notification jobs (canary — lowest risk) (1 week)
Monitor for 2 weeks, validate behavior
Migrate remaining job types in order of business impact (6 weeks)
Decommission BullMQ (1 week)

Total: ~12 weeks, 1 engineer

Tradeoffs and Risks

Operational overhead: Temporal requires its own cluster (Cassandra backend or cloud-managed). Estimated $300–500/month on AWS.

Learning curve: Temporal's workflow programming model is different from simple job queues. Estimate 1–2 weeks ramp per engineer.

Migration risk: Running two systems in parallel during migration adds complexity.

Alternatives Considered

Fix BullMQ: Addressed retry/visibility issues. Rejected — doesn't solve the durable execution problem, and visibility tooling would require significant custom build.

AWS Step Functions: Managed, no cluster to operate. Rejected — per-state-transition pricing becomes expensive at our job volume, and the visual workflow editor doesn't fit our code-first workflow.

Open Questions

Should we use Temporal Cloud ($0.15/action) or self-managed? At our volume, self-managed is ~$400/month vs $2,000–3,000/month on Temporal Cloud.
What's the rollback plan if Temporal cluster has issues post-migration?

Feedback Requested

Please review by March 14. Specific questions:

Does anyone see risks not covered above?
Is the migration timeline realistic given current sprint commitments?
Any experience with Temporal at similar scale?


---

## Documentation Tools

| Tool | Best For | Cost |
|---|---|---|
| **Notion** | Team wikis, ADRs, internal docs | Free–$16/user/mo |
| **Confluence** | Enterprise teams, Jira-integrated | $5–10/user/mo |
| **GitHub/GitLab Wikis** | Docs-as-code near the codebase | Free |
| **Docusaurus** | Developer-facing external docs | Free (self-hosted) |
| **Mintlify** | API docs, public docs sites | $150–500/mo |
| **Notion + ADRs in repo** | Most engineering teams | $8–16/user/mo |

For internal documentation: start with Notion or Confluence. For external API documentation: Mintlify or Docusaurus.

---

Working With Viprasol

When we join client projects, we often find implicit knowledge — architectural decisions made years ago with no record of why, operational procedures that exist only in one engineer's memory. Our engagement methodology includes documentation sprints that externalize this knowledge before it becomes a critical dependency.

For teams building developer platforms or public APIs, we also write and maintain technical documentation as part of our delivery.

→ Talk to our team about documentation and knowledge management.

Technical Writing for Developers: Docs, ADRs, Runbooks, and RFCs That Get Read

The Documentation Hierarchy

1. READMEs That Work

💼 In 2026, AI Handles What Used to Take a Full Team

Prerequisites

Setup

🎯 One Senior Tech Team for Everything

Testing

Key Concepts

Directory Structure

Deployment

Who to Ask

Context

Decision

Rationale

Alternatives Considered

Consequences

Review Trigger

3. Runbooks

2. Check Stripe Status

3. Check Service Health

4. Correlate with Recent Deploys

5. Rotate Stripe API Key

Post-Incident

Problem

Proposed Solution

Migration Plan

Tradeoffs and Risks

Alternatives Considered

Open Questions

Feedback Requested

Working With Viprasol

See Also

Viprasol Tech Team

Ready to Start Your Project?

Automate the repetitive parts of your business?

Related Articles

Software Architecture Review: ADRs, Technical Debt, and Fitness Functions

Software Documentation: What to Write, How to Structure It, and Tools That Help

On-Call Culture: Rotations, Alert Fatigue, Runbook Hygiene, and Blameless Retrospectives