AWS Certified Data Engineer Associate (DEA-C01) Review Material – OpenSearch

Overview

  • A managed service that makes it easy to deploy, operate, and scale OpenSearch clusters in the AWS Cloud. 
  • An OpenSearch Service domain is synonymous with an OpenSearch cluster. 
  • Automatically detects and replaces failed OpenSearch Service nodes.
  • Has the option for a Managed or Serverless cluster.
  • It can scale out or scale up/down (no downtime).
  • It can be placed inside a VPC or made public.

Managing Indexes

  • Storages:
    1. UltraWarm:
      • A cost-effective way to store large amounts of read-only data.
      • It uses Amazon S3 and a sophisticated caching solution to improve performance.
      • Best-suited to immutable data, such as logs.
    2. Standard:
      • Use “hot” storage, which takes the form of instance stores or Amazon EBS volumes attached to each node.
      • Hot storage provides the fastest possible performance for indexing and searching new data.
    3. Cold:
      • Backed by Amazon S3.
      • Suitable for storing infrequently accessed or historical data.
      • Data suitable for cold storage include infrequently accessed logs, data that must be preserved to meet compliance requirements, and logs that have historical value.
    4. OR1:
      • An instance family for Amazon OpenSearch Service that provides a cost-effective way to store large amounts of data.
      • It uses Amazon Elastic Block Store (Amazon EBS) gp3 or io1 volumes for primary storage, with data copied synchronously to Amazon S3 as it arrives.
      • Suitable for running indexing heavy operational analytics workloads such as log analytics, observability, or security analytics.
      • OR1 instances offer an automatic data recovery option, which improves your domain’s overall reliability.
  • Index State Management (ISM):
    • It lets you define custom management policies that automate routine tasks and apply them to indexes and index patterns.
    • Done through a policy which is attached to an index.
    • Examples of policies are:
      • Hot to warm to cold storage
      • Reduce replica count
      • Take an index snapshot
  • Index Rollup
    • It reduces storage costs by periodically rolling up old data into summarized indexes.
    • With index rollup, you create a new index with selected fields aggregated into coarser time buckets.
    • Reduces data granularity by rolling up old data into condensed indexes
  • Index Transform:
    • You create a different, summarized view of your data centered around certain fields so you can visualize or analyze the data in different ways.
  • Cross-cluster replication:
    • Replicate user indexes, mappings, and metadata from one OpenSearch Service domain to another.
    • It can be used for disaster recovery or to reduce latency.
    • The replication follows an active-passive replication model where the local or follower index pulls data from the remote or leader index.
  • Remote reindex:
    • It lets you copy indexes from one Amazon OpenSearch Service domain to another.
    • You can use it to migrate indexes from one domain to another.

Security

  • Encryption at rest (except for manual snapshots)
  • Encryption in flight, i.e. node-to-node encryption
  • Resource-based policy – specify which actions a principal can perform on the domain’s subresources
  • Identity-based policy
  • IP-based policy – restrict access to a domain to one or more IP addresses or CIDR blocks
  • Dashboard access control via:
    • Cognito
    • SAML
    • Fine-grained access control with HTTP basic authentication
    • IP-based policy
    • Access to a domain that is inside a VPC can be either via Reverse Proxy, Direct Connect, VPN, or Cognito

Hands-On

Stream a CloudWatch Log to Amazon OpenSearch

In this hands-on, we will stream a Cloudwatch log from a Lambda function to an Amazon OpenSearch domain.
  • Create a domain on a managed cluster:
    • Use the instance type t3.small.search to be eligible for a Free Tier.
    • Place the cluster inside a VPC.
    • For the Access Policy, change to ‘Allow All’.
  • Since the OpenSearch nodes are inside a VPC, we need to create a reverse proxy server that will forward our request to the OpenSearch nodes from outside.
    • Launch an EC2 instance with a public IP in the same VPC and subnet as the OpenSearch nodes
    • Install Nginx in the EC2 instance.
    • Configure Nginx as a reverse proxy by modifying its configuration file (nginx.conf). Set the value of the proxy_pass to the OpenSearch ‘Domain endpoint’
    • Start the Nginx server.
  • Test to see if you can connect to the OpenSearch dashboard:
    • Ensure that the Nginx EC2 instance security group allows access to port 80 (HTTP).
    • From your browser, connect to the URL http://<ec2_public_ip>/_dashboards.
  • Create a Amazon OpenSearch Subscription Filter in a CloudWatch logroup.
    • For this hands-on, I used a Lambda CloudWatch log group.
  • Test the streaming:
    • Generate a new log in the log group.
    • Check if a new index is created in the OpenSearch domain.
    • Query the streamed data from the Dashboard and compare it with the CloudWatch log.

Leave a Comment

Your email address will not be published. Required fields are marked *