Back to Blog

Technical Writing for Developers: Docs, ADRs, Runbooks, and RFCs That Get Read

Write technical documentation that engineers actually use — API docs, architecture decision records, runbooks, and RFCs. Includes templates, examples, and tools

Viprasol Tech Team
March 28, 2026
11 min read

Technical Writing for Developers: Docs, ADRs, Runbooks, and RFCs That Get Read

Good documentation is a force multiplier. A well-written runbook reduces a 4 AM incident from 90 minutes to 15. A clear ADR prevents relitigating the same architectural debate six months later. An accurate API reference cuts support requests for your internal platform in half.

Bad documentation — incomplete, outdated, or so vague it could describe anything — is worse than no documentation. Engineers learn to ignore it, and the cost is paid in onboarding time, repeated mistakes, and institutional knowledge locked in individuals' heads.

This guide covers the four document types that provide the most value for engineering teams, with templates and real examples.


The Documentation Hierarchy

Different documentation serves different readers at different times:

TypeReaderWhenGoal
READMEAnyone newFirst contactOrient, not overwhelm
API ReferenceIntegrating developersDuring developmentAccurate, searchable, complete
ADRFuture team membersWhen revisiting a decisionUnderstand why, not just what
RunbookOn-call engineer4 AM incidentExecute under pressure
RFCEngineering teamBefore major changesAlignment before commitment
Architecture docsTeam + stakeholdersOnboarding, planningBig picture understanding

The mistake most teams make: writing documentation for the wrong reader. An ADR written for a technical reader landing in the customer knowledge base. A README that explains the architecture but skips "how to run this locally."


1. READMEs That Work

A README's job is to get a new contributor productive, not to document everything. It has one job: get someone from zero to running in under 20 minutes.

README template:

# Service Name

One sentence description of what this does and who uses it.

💼 In 2026, AI Handles What Used to Take a Full Team

Lead qualification, customer support, data entry, report generation, email responses — AI agents now do all of this automatically. We build and deploy them for your business.

  • AI agents that qualify leads while you sleep
  • Automated customer support that resolves 70%+ of tickets
  • Internal workflow automation — save 15+ hours/week
  • Integrates with your CRM, email, Slack, and ERP

Prerequisites

  • Node.js 20+
  • PostgreSQL 16
  • Redis 7

Setup

# Clone and install
git clone https://github.com/org/service-name
cd service-name
npm install

# Configure environment
cp .env.example .env
# Edit .env — see .env.example for required values

# Run database migrations
npm run db:migrate

# Start development server
npm run dev

The app runs at http://localhost:3000

🎯 One Senior Tech Team for Everything

Instead of managing 5 freelancers across 3 timezones, work with one accountable team that covers product development, AI, cloud, and ongoing support.

  • Web apps, AI agents, trading systems, SaaS platforms
  • 100+ projects delivered — 5.0 star Upwork record
  • Fractional CTO advisory available for funded startups
  • Free 30-min no-pitch consultation

Testing

npm test              # Unit tests
npm run test:integration  # Integration tests (requires postgres + redis)
npm run test:e2e          # End-to-end (requires running app)

Key Concepts

Brief explanation of the 2-3 things a new contributor needs to understand to work on this. Not the whole architecture — just what makes this service different from a standard REST API.

Directory Structure

src/
  routes/     API route handlers
  services/   Business logic
  models/     Database models (Prisma)
  lib/        Shared utilities
  workers/    Background job processors

Deployment

Link to deployment runbook or describe the deploy command.

Who to Ask

  • Architecture questions: @alice
  • On-call rotation: #on-call Slack channel
  • Incident history: PagerDuty dashboard (link)

Keep it short. If the README is the right place for something, it belongs in the README. If it's more than 300 lines, it's a wiki article, not a README.

---

## 2. Architecture Decision Records (ADRs)

An ADR captures a significant technical decision: what you chose, why, and what you rejected. The value is in the "why" — future engineers need to understand the context that led to a choice before they can evaluate whether it still makes sense.

**ADR template** (based on Michael Nygard's format):

```markdown
# ADR-0012: Use PostgreSQL for primary storage instead of DynamoDB

**Date**: 2026-02-15  
**Status**: Accepted  
**Deciders**: Alice Chen (Tech Lead), Bob Patel (Backend), Carol Smith (CTO)

Context

We're building the core transaction ledger for our payments platform. We need to decide on the primary data store before the team scales.

Our initial spike used DynamoDB (the team has AWS experience and it scales horizontally). After 3 weeks of development, we hit friction:

  • Multi-item transactions require client-side transaction logic
  • Complex queries for the reconciliation reports required full table scans
  • The team spent more time fighting DynamoDB data modeling than building features

The payments domain requires ACID guarantees for transfers, and our analytical queries (reconciliation, reporting) are fundamentally relational.

Decision

We will use PostgreSQL (via RDS Aurora) as the primary data store.

Rationale

  1. ACID transactions: Multi-step money movement (debit account A, credit account B, insert transaction record) must be atomic. PostgreSQL handles this natively.

  2. Query complexity: Our reconciliation queries join 4–5 tables with aggregations. These are natural in SQL; painful in NoSQL.

  3. Team expertise: 4 of 5 engineers have production PostgreSQL experience. Zero have production DynamoDB experience.

  4. Horizontal scaling is not our bottleneck: At our projected scale (10M transactions/year), a single RDS instance handles the write load. We can add read replicas if needed.

Alternatives Considered

DynamoDB: Rejected. ACID transactions are cumbersome, analytical queries are slow and expensive, and the team's productivity was measurably lower. Revisit if we reach a write scale that PostgreSQL can't handle.

MongoDB: Rejected. Same issues with transactions and analytical queries. Better fit for document-centric data, not relational transactional data.

CockroachDB: Interesting for global distribution, but adds operational complexity we don't need at our current scale. Revisit at 50+ engineers.

Consequences

Positive:

  • Team can move fast in a familiar environment
  • ACID guarantees simplify application code
  • Rich query support for reporting without a separate analytics system

Negative:

  • Horizontal write scaling requires sharding or Citus if we outgrow Aurora's write capacity (unlikely in the next 2 years)
  • We're tied to Aurora's managed cost model ($200–2,000/mo at our scale)

Review Trigger

Revisit this decision if write throughput exceeds 50,000 TPS sustained or if Aurora costs exceed $5,000/month.


**Where to store ADRs**: `docs/decisions/` in the repository. Number them sequentially (`ADR-0001`, `ADR-0002`). Keep them in source control alongside the code they document.

---

3. Runbooks

A runbook is an operational procedure for a specific, recurring scenario — most critically, incident response. The audience is an on-call engineer at 3 AM who may be sleep-deprived and stressed.

Runbook requirements:

  • Specific, not general ("restart the API pods" not "investigate the API")
  • Observable — tell the engineer what healthy looks like after each step
  • Decision trees for non-obvious paths
  • Links to dashboards, logs, and escalation contacts
  • Written so a new team member on their first on-call can follow it

Runbook template:

# Runbook: High Payment Failure Rate

**Alert**: `PaymentFailureRate > 5%` for 5 minutes  
**Severity**: P1 (customer-facing revenue impact)  
**On-call channel**: #incidents  
**Escalation**: @payments-lead → @cto

---

## 1. Assess Scope

Check: [Grafana dashboard — Payment Errors](https://grafana.internal/payments)

Is the failure rate:
- **Spiking sharply (< 5 min)** → likely a deploy or Stripe outage, go to §2
- **Gradual increase (> 30 min)** → likely a resource issue, go to §3
- **Constant since a specific time** → correlate with deploy history, go to §4

---

2. Check Stripe Status

Open: https://status.stripe.com

If Stripe has an active incident:

  1. Post in #incidents: "Stripe incident in progress: [link to Stripe status]"
  2. Enable payment retry banner in admin: https://admin.internal/settings
  3. Notify @payments-lead
  4. Wait for Stripe resolution — no action required on our side
  5. Monitor dashboard until failure rate drops below 1%

If Stripe is healthy, continue to §3.


3. Check Service Health

Run in order:

# Check payment service pod status
kubectl get pods -n production -l app=payment-service

# Check recent error logs (last 100)
kubectl logs -n production -l app=payment-service --tail=100 | grep ERROR

# Check database connections
kubectl exec -n production deployment/payment-service -- \
  node -e "require('./lib/db').checkConnections().then(console.log)"

If pods are CrashLoopBackOff: → Go to Runbook: Pod crash recovery

If database connections are exhausted: → Go to Runbook: Database connection pool exhaustion

If logs show Invalid API key: → Stripe API key rotation required. Go to §5.


4. Correlate with Recent Deploys

Check deploy history:

kubectl rollout history deployment/payment-service -n production

If a deploy occurred in the last 30 minutes:

# Rollback to previous version
kubectl rollout undo deployment/payment-service -n production

# Monitor for 5 minutes
watch kubectl get pods -n production -l app=payment-service

Verify failure rate drops in Grafana. If it does, file an incident report and notify the deploy author.


5. Rotate Stripe API Key

⚠️ Do not rotate the key without confirming the current key is compromised or expired.

  1. Generate new restricted key in Stripe dashboard → Developers → API Keys
  2. Add to AWS Secrets Manager: production/payment-service/stripe-key
  3. Trigger pod restart:
    kubectl rollout restart deployment/payment-service -n production
    
  4. Verify pods restart successfully and failure rate drops

Post-Incident

Within 24 hours:

  • Write incident timeline in #incidents thread
  • Create follow-up tickets for any identified gaps
  • Update this runbook if steps were unclear or missing

---

## 4. RFCs (Request for Comments)

An RFC is a proposal for a significant technical change, shared before implementation begins. Its purpose is alignment — catching objections and alternatives before engineering time is spent, not after.

**RFC template:**

```markdown
# RFC: Replace BullMQ with Temporal for job orchestration

**Author**: Alice Chen  
**Date**: 2026-03-01  
**Status**: Draft → Review → Accepted / Rejected  
**Discussion**: #eng-architecture Slack thread  
**Review deadline**: 2026-03-14

---

Problem

Our current BullMQ-based job system handles 50K jobs/day. Three recurring issues:

  1. Retry logic is duplicated across 12 job types — each implements its own backoff
  2. Long-running jobs fail invisibly when the worker pod restarts mid-execution
  3. Job visibility is poor — we have no way to trace why a specific order's notification failed without reading raw Redis data

These issues have caused 3 customer-visible incidents in Q1 2026.

Proposed Solution

Replace BullMQ with Temporal.io for job orchestration.

Temporal provides:

  • Durable execution (jobs survive worker restarts)
  • Built-in retry with configurable backoff per step
  • Full execution history for debugging
  • Workflow composition (jobs that spawn sub-jobs)

Migration Plan

  1. Stand up Temporal cluster on ECS (2 weeks)
  2. Migrate email notification jobs (canary — lowest risk) (1 week)
  3. Monitor for 2 weeks, validate behavior
  4. Migrate remaining job types in order of business impact (6 weeks)
  5. Decommission BullMQ (1 week)

Total: ~12 weeks, 1 engineer

Tradeoffs and Risks

Operational overhead: Temporal requires its own cluster (Cassandra backend or cloud-managed). Estimated $300–500/month on AWS.

Learning curve: Temporal's workflow programming model is different from simple job queues. Estimate 1–2 weeks ramp per engineer.

Migration risk: Running two systems in parallel during migration adds complexity.

Alternatives Considered

Fix BullMQ: Addressed retry/visibility issues. Rejected — doesn't solve the durable execution problem, and visibility tooling would require significant custom build.

AWS Step Functions: Managed, no cluster to operate. Rejected — per-state-transition pricing becomes expensive at our job volume, and the visual workflow editor doesn't fit our code-first workflow.

Open Questions

  1. Should we use Temporal Cloud ($0.15/action) or self-managed? At our volume, self-managed is ~$400/month vs $2,000–3,000/month on Temporal Cloud.
  2. What's the rollback plan if Temporal cluster has issues post-migration?

Feedback Requested

Please review by March 14. Specific questions:

  • Does anyone see risks not covered above?
  • Is the migration timeline realistic given current sprint commitments?
  • Any experience with Temporal at similar scale?

---

## Documentation Tools

| Tool | Best For | Cost |
|---|---|---|
| **Notion** | Team wikis, ADRs, internal docs | Free–$16/user/mo |
| **Confluence** | Enterprise teams, Jira-integrated | $5–10/user/mo |
| **GitHub/GitLab Wikis** | Docs-as-code near the codebase | Free |
| **Docusaurus** | Developer-facing external docs | Free (self-hosted) |
| **Mintlify** | API docs, public docs sites | $150–500/mo |
| **Notion + ADRs in repo** | Most engineering teams | $8–16/user/mo |

For internal documentation: start with Notion or Confluence. For external API documentation: Mintlify or Docusaurus.

---

Working With Viprasol

When we join client projects, we often find implicit knowledge — architectural decisions made years ago with no record of why, operational procedures that exist only in one engineer's memory. Our engagement methodology includes documentation sprints that externalize this knowledge before it becomes a critical dependency.

For teams building developer platforms or public APIs, we also write and maintain technical documentation as part of our delivery.

Talk to our team about documentation and knowledge management.


See Also

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Ready to Start Your Project?

Whether it's trading bots, web apps, or AI solutions — we deliver excellence.

Free consultation • No commitment • Response within 24 hours

Viprasol · AI Agent Systems

Automate the repetitive parts of your business?

Our AI agent systems handle the tasks that eat your team's time — scheduling, follow-ups, reporting, support — across Telegram, WhatsApp, email, and 20+ other channels.