Back to Blog

AWS OpenSearch Analytics: Index Mappings, Aggregations, and Dashboards

Build analytics pipelines with AWS OpenSearch Service. Covers cluster setup with Terraform, index mapping design, aggregation queries for metrics, real-time log ingestion, and OpenSearch Dashboards.

Viprasol Tech Team
March 14, 2027
13 min read

AWS OpenSearch: Search and Analytics Architecture (2026)

Search and analytics capabilities separate good products from great ones. At Viprasol, we've moved beyond single-server Elasticsearch setups to use AWS OpenSearch for building search and analytics at scale. This post covers what we've learned about deploying, operating, and optimizing OpenSearch in production.

From Elasticsearch to AWS OpenSearch

For years, we deployed self-managed Elasticsearch clusters. We dealt with version upgrades, heap sizing, node management, and the endless parade of "how do I fix this" operational issues. When AWS released OpenSearch as a managed service, it fundamentally changed how we build search infrastructure.

OpenSearch (the AWS fork of Elasticsearch) removes the operational burden while maintaining the power of full-text search, log analytics, and complex aggregations. You don't manage nodes, patches, or scaling—AWS handles that. You focus on indexing strategy, query optimization, and business logic.

Understanding OpenSearch Architecture

OpenSearch consists of several components:

Nodes and Cluster Configuration

A typical production cluster has three types of nodes:

Code:

# Master nodes (manage cluster state)
- nodes: 3
  type: master
  storage: 20GB
  instance_type: t3.small.search

# Data nodes (store and search data)
- nodes: 3
  type: data
  storage: 500GB
  instance_type: r6g.large.search

# Coordinating nodes (route requests)
- nodes: 2
  type: coordinating
  storage: 0GB
  instance_type: m6g.large.search

Master nodes manage cluster state. Data nodes store indices and process searches. Coordinating nodes route requests without storing data.

Sharding Strategy

Shards determine how data is distributed and parallelized:

Code:

{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 2
  }
}

This configuration creates:

  • 5 primary shards (data split across 5 partitions)
  • 2 replica shards per primary (6 copies total)
  • Can handle 2 node failures without data loss
  • Parallelizes searches across 5 shards

Number of shards should roughly match your data nodes. More shards = more parallelism but higher overhead.

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

Setting Up AWS OpenSearch for Web Development

For web development projects requiring search, here's a production setup:

Code:

import { Client } from '@opensearch-project/opensearch';

const client = new Client({
  node: process.env.OPENSEARCH_ENDPOINT,
  auth: {
    username: process.env.OPENSEARCH_USERNAME,
    password: process.env.OPENSEARCH_PASSWORD
  },
  ssl: {
    rejectUnauthorized: true
  }
});

// Create an index with proper mapping
async function createSearchIndex(indexName: string) {
  const indexExists = await client.indices.exists({ index: indexName });
  
  if (!indexExists.body) {
    await client.indices.create({
      index: indexName,
      body: {
        settings: {
          number_of_shards: 5,
          number_of_replicas: 2,
          analysis: {
            analyzer: {
              text_analyzer: {
                type: 'custom',
                tokenizer: 'standard',
                filter: ['lowercase', 'stop']
              }
            }
          }
        },
        mappings: {
          properties: {
            id: { type: 'keyword' },
            title: { 
              type: 'text',
              analyzer: 'text_analyzer',
              fields: {
                keyword: { type: 'keyword' }
              }
            },
            description: {
              type: 'text',
              analyzer: 'text_analyzer'
            },
            category: { type: 'keyword' },
            price: { type: 'float' },
            created_at: { type: 'date' },
            updated_at: { type: 'date' }
          }
        }
      }
    });
  }
}

Indexing Strategies

Bulk Indexing

For large datasets, bulk operations are more efficient:

Code:

async function bulkIndexDocuments(docs: Array<any>) {
  const bulk = [];
  
  docs.forEach(doc => {
    // Action metadata
    bulk.push({
      index: {
        _index: 'products',
        _id: doc.id
      }
    });
    // Document
    bulk.push(doc);
  });

  await client.bulk({
    body: bulk,
    requestTimeout: 30000
  });
}

// Index 100k docs in batches
const batchSize = 1000;
for (let i = 0; i < allDocs.length; i += batchSize) {
  const batch = allDocs.slice(i, i + batchSize);
  await bulkIndexDocuments(batch);
}

Real-Time Indexing with Refresh Strategy

For SaaS development, documents need to be searchable quickly:

Code:

async function indexDocument(doc: any) {
  // Index and immediately refresh to make searchable
  await client.index({
    index: 'products',
    id: doc.id,
    body: doc,
    refresh: true // Makes document searchable immediately
  });
}

// For high-volume scenarios, batch refresh
async function indexDocumentBatched(doc: any) {
  // Don't refresh on each document
  await client.index({
    index: 'products',
    id: doc.id,
    body: doc
  });
  
  // Refresh all documents every 5 seconds
  setTimeout(() => {
    client.indices.refresh({ index: 'products' });
  }, 5000);
}
AWS - AWS OpenSearch Analytics: Index Mappings, Aggregations, and Dashboards

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

Building Advanced Search Queries

Full-Text Search

Find documents matching query text across multiple fields:

Code:

async function searchProducts(query: string) {
  const response = await client.search({
    index: 'products',
    body: {
      query: {
        multi_match: {
          query: query,
          fields: ['title^2', 'description', 'category'],
          fuzziness: 'AUTO'
        }
      },
      size: 20,
      from: 0
    }
  });
  
  return response.body.hits.hits.map(hit => hit._source);
}

The ^2 on title boosts relevance for title matches. fuzziness: 'AUTO' handles typos.

Filtering with Aggregations

Combine search with faceted navigation:

Code:

async function searchWithFilters(
  query: string,
  category?: string,
  priceRange?: { min: number; max: number }
) {
  const filters = [];

  if (category) {
    filters.push({ term: { category: category } });
  }

  if (priceRange) {
    filters.push({
      range: {
        price: {
          gte: priceRange.min,
          lte: priceRange.max
        }
      }
    });
  }

  const response = await client.search({
    index: 'products',
    body: {
      query: {
        bool: {
          must: [
            {
              multi_match: {
                query: query,
                fields: ['title', 'description']
              }
            }
          ],
          filter: filters
        }
      },
      aggregations: {
        categories: {
          terms: {
            field: 'category',
            size: 10
          }
        },
        price_ranges: {
          range: {
            field: 'price',
            ranges: [
              { to: 50 },
              { from: 50, to: 100 },
              { from: 100, to: 500 },
              { from: 500 }
            ]
          }
        }
      },
      size: 20
    }
  });

  return {
    results: response.body.hits.hits.map(hit => hit._source),
    filters: {
      categories: response.body.aggregations.categories.buckets,
      prices: response.body.aggregations.price_ranges.buckets
    }
  };
}

Log Analytics and Monitoring

For cloud solutions, OpenSearch excels at log aggregation:

Code:

// Index application logs
async function indexLog(log: {
  timestamp: Date;
  level: string;
  service: string;
  message: string;
  trace_id?: string;
  error?: any;
}) {
  await client.index({
    index: **logs-${log.service}-${dateString}**,
    body: {
      '@timestamp': log.timestamp,
      level: log.level,
      service: log.service,
      message: log.message,
      trace_id: log.trace_id,
      error: log.error
    }
  });
}

// Find errors for a specific service
async function findErrors(service: string, hoursBack: number = 1) {
  return client.search({
    index: **logs-${service}-***,
    body: {
      query: {
        bool: {
          must: [
            { term: { level: 'ERROR' } },
            {
              range: {
                '@timestamp': {
                  gte: **now-${hoursBack}h**
                }
              }
            }
          ]
        }
      },
      size: 100,
      sort: [{ '@timestamp': { order: 'desc' } }]
    }
  });
}

Performance Optimization

Index Lifecycle Management

Automatically manage old indices:

Code:

policy:
  name: "log-policy"
  states:
    - name: "hot"
      actions:
        - rollover:
            max_size: "50GB"
            max_age: "1d"
        - set_priority:
            priority: 100

    - name: "warm"
      min_age: "7d"
      actions:
        - replica:
            number_of_replicas: 1
        - set_priority:
            priority: 50

    - name: "cold"
      min_age: "30d"
      actions:
        - replica:
            number_of_replicas: 0
        - set_priority:
            priority: 0

    - name: "delete"
      min_age: "90d"
      actions:
        - delete: {}

This policy:

  • Keeps hot data (recent) on fast nodes with replicas
  • Moves warm data to slower nodes with fewer replicas
  • Moves cold data to archive nodes with no replicas
  • Deletes data after 90 days

Query Optimization

Use query caching and request batching:

Code:

// Cache frequently used queries
async function getCachedResults(key: string, queryFn: () => Promise<any>) {
  const cached = await cache.get(key);
  if (cached) return cached;

  const results = await queryFn();
  await cache.set(key, results, 3600); // Cache for 1 hour
  return results;
}

// Use Mget for multiple document lookups
async function getMultipleDocuments(ids: string[]) {
  return client.mget({
    body: {
      docs: ids.map(id => ({
        _index: 'products',
        _id: id
      }))
    }
  });
}

Monitoring and Alerting

Set up CloudWatch alarms for critical metrics:

MetricAlert ThresholdAction
Cluster HealthRedPage on-call engineer
Disk Space> 80%Scale cluster or archive old data
JVM Memory> 85%Review and scale memory
Query Latency> 1000ms (p99)Optimize queries or add nodes

Disaster Recovery and Backups

Snapshot Strategy

Code:

async function createSnapshot(snapshotName: string) {
  await client.snapshot.create({
    repository: 's3-backup',
    snapshot: snapshotName,
    body: {
      indices: '*'
    }
  });
}

// Restore from snapshot
async function restoreSnapshot(snapshotName: string) {
  await client.snapshot.restore({
    repository: 's3-backup',
    snapshot: snapshotName
  });
}

Schedule daily snapshots to S3. Test restoration quarterly.

Security Considerations

Access Control

Code:

elasticsearch_yml:
  opendistro_security:
    ssl:
      http:
        enabled: true
        pemcert_filepath: certs/node.pem
        pemkey_filepath: certs/node-key.pem
        pemtrustedcas_filepath: certs/root-ca.pem
      transport:
        enforce_hostname_verification: true

    authc:
      basic_internal_auth_domain:
        enabled: true
        order: 0
        http_enabled: true
        transport_enabled: true
        auth_backend: internal

    authz:
      default:
        enabled: true
        roles:
          - admin
          - user

Index-Level Encryption

Code:

// Enable encryption at rest
const domainConfig = {
  encryptionAtRestOptions: {
    enabled: true,
    kmsKeyArn: 'arn:aws:kms:region:account:key/id'
  },
  nodeToNodeEncryptionOptions: {
    enabled: true
  }
};

Common Challenges and Solutions

Problem: Out of memory errors Solution: Increase heap size (up to 32GB), add data nodes, reduce shards

Problem: Slow queries Solution: Add indices, use better queries, implement caching, increase shard count

Problem: Data inconsistency Solution: Use bulk operations with refresh, implement monitoring, test failure scenarios

Problem: High disk usage Solution: Compress data, implement ILM policy, delete old indices, optimize mappings

Advanced Query Patterns

Complex Nested Queries

Handle hierarchical data structures:

Code:

async function findComplexResults(filters: {
  category: string;
  priceRange: { min: number; max: number };
  ratings: { min: number };
}) {
  return client.search({
    index: 'products',
    body: {
      query: {
        bool: {
          must: [
            { term: { 'category.keyword': filters.category } }
          ],
          filter: [
            {
              range: {
                price: {
                  gte: filters.priceRange.min,
                  lte: filters.priceRange.max
                }
              }
            },
            {
              nested: {
                path: 'reviews',
                query: {
                  range: {
                    'reviews.rating': {
                      gte: filters.ratings.min
                    }
                  }
                }
              }
            }
          ]
        }
      },
      aggs: {
        avg_price: { avg: { field: 'price' } },
        price_histogram: {
          histogram: {
            field: 'price',
            interval: 50
          }
        }
      }
    }
  });
}

Time-Series Analytics

Optimize for time-series data:

Code:

// Create time-based index template
async function createTimeSeriesTemplate() {
  return client.indices.putIndexTemplate({
    name: 'metrics-template',
    body: {
      index_patterns: ['metrics-*'],
      template: {
        settings: {
          number_of_shards: 2,
          number_of_replicas: 1
        },
        mappings: {
          properties: {
            '@timestamp': { type: 'date' },
            metric_name: { type: 'keyword' },
            value: { type: 'float' },
            tags: { type: 'keyword' }
          }
        }
      }
    }
  });
}

// Query metrics over time
async function getMetricsOverTime(
  metricName: string,
  startTime: Date,
  endTime: Date
) {
  return client.search({
    index: 'metrics-*',
    body: {
      query: {
        bool: {
          must: [
            { term: { metric_name: metricName } },
            {
              range: {
                '@timestamp': {
                  gte: startTime.toISOString(),
                  lte: endTime.toISOString()
                }
              }
            }
          ]
        }
      },
      aggs: {
        timeline: {
          date_histogram: {
            field: '@timestamp',
            interval: '1h'
          },
          aggs: {
            avg_value: { avg: { field: 'value' } },
            max_value: { max: { field: 'value' } }
          }
        }
      }
    }
  });
}

Custom Scoring and Ranking

Implement business logic in search results:

Code:

async function rankByRelevance(query: string) {
  return client.search({
    index: 'products',
    body: {
      query: {
        function_score: {
          query: {
            multi_match: {
              query: query,
              fields: ['title', 'description']
            }
          },
          functions: [
            // Boost popular products
            {
              field_value_factor: {
                field: 'popularity_score',
                modifier: 'log1p',
                factor: 1.2
              }
            },
            // Boost highly rated products
            {
              filter: {
                range: { avg_rating: { gte: 4.5 } }
              },
              weight: 2
            },
            // Decay score for older products
            {
              gauss: {
                'created_at': {
                  origin: 'now',
                  scale: '365d',
                  decay: 0.5
                }
              }
            }
          ],
          boost_mode: 'multiply'
        }
      }
    }
  });
}

Optimization at Scale

Connection Pooling and Retry Logic

Handle transient failures gracefully:

Code:

const client = new Client({
  node: process.env.OPENSEARCH_ENDPOINT,
  auth: { username, password },
  // Connection pooling
  maxSockets: 40,
  maxRetries: 3,
  requestTimeout: 30000,
  // Retry on specific errors
  nodeFilter: (node) => {
    return !node.role?.includes('ingest');
  },
  sniffInterval: false // Disable sniffing in managed service
});

Bulk Operations with Circuit Breaker

Prevent overwhelming the cluster:

Code:

class BulkProcessor {
  private queue: any[] = [];
  private isProcessing = false;
  private failureCount = 0;
  private maxConsecutiveFailures = 3;

  async add(operation: any) {
    this.queue.push(operation);
    if (this.queue.length >= 1000) {
      await this.flush();
    }
  }

  async flush() {
    if (this.isProcessing || this.failureCount >= this.maxConsecutiveFailures) {
      return;
    }

    this.isProcessing = true;
    try {
      const response = await client.bulk({
        body: this.queue,
        requestTimeout: 60000
      });

      if (response.body.errors) {
        this.failureCount++;
      } else {
        this.failureCount = 0;
      }

      this.queue = [];
    } catch (error) {
      this.failureCount++;
      console.error('Bulk operation failed:', error);
    } finally {
      this.isProcessing = false;
    }
  }
}

External Resources and Documentation

Learn more about OpenSearch and Elasticsearch from authoritative sources:

These resources provide in-depth information about configuration, optimization, and production deployment strategies.

FAQ

Q: How many shards should my index have? A: Rule of thumb: shard count should match your data node count. Start with 5, adjust based on query performance. Too many shards = overhead, too few = limited parallelism.

Q: What's the maximum size of an index? A: There's no hard limit, but indices typically perform best under 50GB per shard. Larger shards take longer to recover if a node fails.

Q: Can I rename an index? A: No, but you can use index alias to point to a new index and reindex data with zero downtime.

Q: How do I handle schema changes? A: Use index aliases and reindex to a new index with new mapping. This allows zero-downtime migrations.

Q: Should I use T3 instances for production? A: No. Use R6g or M6g instances for production. T3 instances are for development/testing only. They'll throttle under consistent load.

Q: What's the cost of OpenSearch on AWS? A: Roughly $0.30-0.50 per instance-hour for data nodes, plus data transfer. A production cluster costs $2k-5k/month depending on size and region.

Q: How do I prevent query timeouts? A: Set appropriate timeouts, use request caching, optimize query complexity, and monitor slow query logs.

Q: Can I use OpenSearch for real-time analytics? A: Yes, but be aware of refresh interval trade-offs. Frequent refreshes (1s) use more resources. Most use 30-60 second intervals.

Moving Forward

OpenSearch removes the operational pain of search infrastructure while maintaining the flexibility to handle complex queries and analytics. The teams we work with across web development, SaaS, and cloud solutions rely on it for customer search, log analysis, and business analytics.

Start with basic full-text search. Add filtering and aggregations as your needs grow. Implement proper indexing strategy early. Monitor performance metrics closely. The foundation you build today determines how easily you scale tomorrow. With proper planning and operational discipline, OpenSearch can handle search and analytics requirements for years without major rearchitecting.

AWSOpenSearchAnalyticsTerraformElasticsearchLogsDashboards
Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 1000+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.