Back to Blog

DevOps Best Practices: CI/CD, Monitoring, and Infrastructure Automation in 2026

Implement DevOps best practices with GitHub Actions CI/CD, Prometheus monitoring, Terraform infrastructure, and GitOps workflows. Real configs, cost tables, and

Viprasol Tech Team
March 20, 2026
13 min read

DevOps Best Practices: CI/CD, Monitoring, and Infrastructure Automation in 2026

DevOps is the discipline of reducing the time and risk between writing code and running it in production. The practices here aren't theoretical — they're the specific implementations we use for clients handling real production traffic, and the gaps we've seen cause outages or slow engineering teams to a crawl.


The Four Pillars of Mature DevOps

  1. Continuous Integration: Every code change is built and tested automatically
  2. Continuous Deployment: Passing changes deploy to production without manual intervention
  3. Infrastructure as Code: All infrastructure is version-controlled and reproducible
  4. Observability: You know what's happening in production before users tell you

Most teams have partial implementations of each. The compounding value comes from having all four working together.


Pillar 1: CI That Actually Works

A CI pipeline that takes 45 minutes to run is nearly useless — developers stop waiting for it and merge anyway. The goal is a pipeline that completes in under 10 minutes and catches real bugs before they reach main.

GitHub Actions workflow with parallel jobs:

# .github/workflows/ci.yml
name: CI

on:
  pull_request:
    branches: [main, develop]

env:
  NODE_VERSION: '20'

jobs:
  # Fast check — runs first, fails fast
  lint-types:
    name: Lint & Type Check
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'
      - run: npm ci
      - run: npm run lint
      - run: npm run type-check

  # Unit tests — parallel by directory
  unit-tests:
    name: Unit Tests
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1, 2, 3, 4]  # Split tests across 4 runners
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'
      - run: npm ci
      - run: npx jest --shard=${{ matrix.shard }}/4 --coverage --coverageReporters=json
      - uses: actions/upload-artifact@v4
        with:
          name: coverage-${{ matrix.shard }}
          path: coverage/coverage-final.json

  # Integration tests — needs postgres
  integration-tests:
    name: Integration Tests
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: testdb
          POSTGRES_USER: testuser
          POSTGRES_PASSWORD: testpass
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'
      - run: npm ci
      - run: npm run db:migrate:test
        env:
          DATABASE_URL: postgresql://testuser:testpass@localhost:5432/testdb
      - run: npm run test:integration
        env:
          DATABASE_URL: postgresql://testuser:testpass@localhost:5432/testdb

  # Security scan
  security:
    name: Security Audit
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm audit --audit-level=high
      - uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          severity: 'CRITICAL,HIGH'

  # Build Docker image
  build:
    name: Build & Push Image
    runs-on: ubuntu-latest
    needs: [lint-types, unit-tests, integration-tests, security]
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
      - uses: aws-actions/amazon-ecr-login@v2
      - name: Build and push
        run: |
          IMAGE_TAG=${{ github.sha }}
          docker build -t ${{ secrets.ECR_REGISTRY }}/app:$IMAGE_TAG .
          docker push ${{ secrets.ECR_REGISTRY }}/app:$IMAGE_TAG
          echo "IMAGE_TAG=$IMAGE_TAG" >> $GITHUB_OUTPUT

Key CI principles:

  • Fail fast: Lint and type checks run first, before slower tests
  • Parallel where possible: Test sharding cuts runtime proportionally
  • Cache aggressively: npm ci with cache cuts 60–90s per run
  • Security in CI: Catching vulnerabilities before production is far cheaper than after

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

Pillar 2: Continuous Deployment with Zero Downtime

ECS blue/green deployment with CodeDeploy:

# .github/workflows/deploy.yml
name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  deploy:
    needs: [build]  # From CI workflow
    runs-on: ubuntu-latest
    environment: production  # Requires manual approval for prod
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: Update ECS task definition
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: task-definition.json
          container-name: app
          image: ${{ secrets.ECR_REGISTRY }}/app:${{ github.sha }}

      - name: Deploy to ECS with blue/green
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: production-app
          cluster: production
          wait-for-service-stability: true
          codedeploy-appspec: appspec.yaml
          codedeploy-application: production-app
          codedeploy-deployment-group: production-blue-green

      - name: Run smoke tests
        run: |
          sleep 30  # Wait for deployment to propagate
          curl -f https://api.yourapp.com/health || exit 1

      - name: Notify on failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          slack-bot-token: ${{ secrets.SLACK_BOT_TOKEN }}
          channel-id: 'deployments'
          slack-message: "❌ Production deployment failed for ${{ github.sha }}"

Pillar 3: GitOps with Terraform

Infrastructure changes should go through the same review process as code changes. GitOps means your Git repository is the single source of truth for infrastructure state.

# .github/workflows/terraform.yml
name: Terraform

on:
  pull_request:
    paths: ['terraform/**']
  push:
    branches: [main]
    paths: ['terraform/**']

jobs:
  terraform:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: terraform/
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: '1.7.0'
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - run: terraform init
      - run: terraform validate
      - run: terraform fmt -check

      - name: Terraform Plan
        id: plan
        run: terraform plan -out=tfplan -no-color
        continue-on-error: true  # Comment plan even on failure

      - name: Comment PR with plan
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const output = `#### Terraform Plan 📋
            \`\`\`
            ${{ steps.plan.outputs.stdout }}
            \`\`\``;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            });

      - name: Terraform Apply
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform apply -auto-approve tfplan

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

Pillar 4: Observability Stack

A production system without observability is a black box. You need metrics (what's happening), logs (what went wrong), and traces (why a specific request was slow).

Prometheus alerting rules:

# prometheus/alerts.yml
groups:
  - name: application
    rules:
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{status=~"5.."}[5m])
          / rate(http_requests_total[5m]) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Error rate above 5% for {{ $labels.service }}"
          description: "Current error rate: {{ $value | humanizePercentage }}"

      - alert: SlowP99Latency
        expr: |
          histogram_quantile(0.99, 
            rate(http_request_duration_seconds_bucket[5m])
          ) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency above 2s for {{ $labels.service }}"

      - alert: HighMemoryUsage
        expr: |
          container_memory_usage_bytes{container!=""}
          / container_spec_memory_limit_bytes{container!=""} > 0.85
        for: 10m
        labels:
          severity: warning

  - name: database
    rules:
      - alert: PostgresSlowQueries
        expr: pg_stat_activity_max_tx_duration{state="active"} > 30
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "PostgreSQL query running for >30s"

      - alert: PostgresConnectionsHigh
        expr: |
          pg_stat_database_numbackends / pg_settings_max_connections > 0.8
        for: 5m
        labels:
          severity: critical

Structured logging in Node.js (Pino):

// lib/logger.ts
import pino from 'pino';

export const logger = pino({
  level: process.env.LOG_LEVEL ?? 'info',
  formatters: {
    level: (label) => ({ level: label }),
  },
  base: {
    service: process.env.SERVICE_NAME,
    version: process.env.APP_VERSION,
    env: process.env.NODE_ENV,
  },
  // In production, ship JSON logs to CloudWatch/Datadog/Loki
  // In dev, pretty-print for readability
  transport: process.env.NODE_ENV === 'development'
    ? { target: 'pino-pretty', options: { colorize: true } }
    : undefined,
});

// Request context with trace ID
export function createRequestLogger(requestId: string, userId?: string) {
  return logger.child({ requestId, userId });
}

// Usage
app.addHook('onRequest', async (request, reply) => {
  const requestId = request.headers['x-request-id'] as string 
    ?? crypto.randomUUID();
  request.log = createRequestLogger(requestId, request.headers['x-user-id'] as string);
  reply.header('x-request-id', requestId);
});

DevOps Toolchain Reference

CategoryToolCostWhen
CI/CDGitHub ActionsFree–$48/moMajority of teams
CI/CDGitLab CIFree–$19/user/moSelf-hosted or GitLab users
Container RegistryAWS ECR$0.10/GBAWS-native
IaCTerraformFree (open source)Any cloud
SecretsAWS Secrets Manager$0.40/secret/moProduction secrets
MonitoringPrometheus + GrafanaFree (self-hosted)Kubernetes environments
MonitoringDatadog$15–23/host/moManaged, full-stack
LoggingCloudWatch$0.50/GB ingestedAWS-native
LoggingGrafana LokiFree (self-hosted)Kubernetes + cost-sensitive
AlertingPagerDuty$21/user/moOn-call rotation
APM + TracingOpenTelemetryFreeAny backend

Team Structure: DevOps vs SRE vs Platform Engineering

RoleFocusWhen You Need It
DevOps EngineerCI/CD pipelines, automation, IaC10+ engineers, 2+ services
SREReliability, SLOs, incident responseMission-critical systems, 50+ engineers
Platform EngineerInternal developer platforms, golden paths100+ engineers, multiple teams

Most startups benefit from a part-time DevOps consultant or embedded DevOps engineer before reaching 20 engineers.


Cost of DevOps Maturity

Maturity LevelWhat's IncludedMonthly Cost
BasicGitHub Actions CI, manual deploy$20–50
StandardCI/CD + Terraform + CloudWatch$150–400
AdvancedBlue/green deploys + Prometheus + Datadog$800–2,000
EnterprisePlatform engineering + SRE toolchain$3,000–10,000+

Tooling cost is usually less than 5% of engineering salary cost at these stages. The ROI comes from engineering time saved and incidents avoided.


Working With Viprasol

We implement DevOps pipelines for product teams that want to ship faster without adding operational risk. That typically means a GitHub Actions CI/CD pipeline, Terraform infrastructure, structured logging, and alerting — delivered in 4–8 weeks, not months.

Our clients typically see deployment frequency increase 3–5× and mean time to recovery (MTTR) drop 60–80% after a DevOps implementation.

Talk to our DevOps team about your current setup.


See Also

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.