AWS Solution Architect Associate (SAA-C02) Review Material – Storage

EBS

  • A network storage
  • Bound to a zone
  • Some types can only be mounted on one instance; others can be attached to multiple instances
  • Must specify capacity
  • Has 6 Volume Types:
    1. gp2
      • SDD
      • For general purposes. Balances price and performance.
      • Good for random reads/writes
      • Can be a boot drive
      • 1 GB – 16 TiB
      • Max IOPS 16,000
      • Volume size and IOPS are linked (difference with gp3)
    2. gp3
      • Similar to gp3 but IOPS is not linked to the volume size
    3. io1
      • SDD
      • Good for random reads/writes
      • Use this if you want a Provisioned IOPS (sustained IOPS (like databases) or if you require more than 16,000 IOPS).
      • 4 GB – 16 TiB
      • Can be a boot drive
      • Max IOPS is 32,000 (x2 of gp2/gp2) or 64,000 (if using EC2 Nitro)
      • Supports Multi-attach (for Linux and Windows) but requires:
        • instances must be on the same single region
        • not more than 16 instances built on the Nitro System
        • same Availability Zone
    4. io2
      • Similar to io1 but newer
      • With Block Express:
        • Size can be from 4 GB – 64TiB
        • Max IOPS 256,000
      • Supports Multi-attach for Linux only
    5. st1
      • HDD
      • Good for sequential reads/writes
      • Use cases: Data Warehousing, Log Processing, Big Data
      • 125 MB – 16TiB
      • Max Throughput is 500MiB
      • Cannot be the boot volume
    6. sc1
      • HDD
      • Has the lowest cost
      • Good for infrequently accessed data
      • 125 MB – 16TiB
      • Max Throughput is 250MiB
  • Encryption:
    • Not enabled by default.
    • When a volume is encrypted:
      • Data at rest is encrypted
      • Data at flight is encrypted
      • Snapshots are encrypted
      • Volumes from snapshots are encrypted
    • If a volume is un-encrypted it will be un-encrypted throughout its lifetime and then its snapshot is un-encrypted as well.
    • How to encrypt and un-encrypted volume
      • Create a snapshot of the volume
      • Copy the snapshot to the new one but enable encryption
      • Create a volume for the encrypted snapshot
      • Attach the new volume to the EC2.

INSTANCE STORE

  • This is basically the storage of the Compute node
  • Better performance than EBS.
  • You will lose your storage when the EC2 is stopped (not restarted). The reason is that when a stopped EC2 is brought back, it may be on a different compute node.
  • Good for cache, temporary storage, buffer.
  • Only available on some EC2 instance types

EFS

  • A POSIX NFS filesystem.
  • Only works with Linux
  • Used for sharing file storage
  • No need to provision the size. Will grow automatically. Can grow in petabytes
  • Can have a max 1000 NFS clients connection.
  • Can control access through File System Policy.
  • File System Type:
    • Regional – redundant across all Availability Zones within an AWS Region.
    • One Zone –  within a single Availability Zone.
  • To mount to an EC2 Linux:
    • mount -t nfs file-system-id.efs.aws-region.amazonaws.com:/ /<mount point>
    • The domain name will resolve to the IP address of the “mount target id” on the same AZ as the EC2.
  • Access Points:
    • are application-specific entry points into an EFS file system that make it easier to manage application access to shared datasets. 
    • can enforce a user identity, including the user’s POSIX groups, for all file system requests that are made through the access point.
    • can enforce a different root directory for the file system so that clients can only access data in the specified directory or its subdirectories
  • Performance Mode:
    • Influence latency and IOPS
    • Cannot be changed once EFS is created
    • Has 2 modes:
      1. General Purpose
        • default performance mode
        • recommended for the majority of workloads and faster performance
      2. Max IO
        • previous generation performance type that is designed for highly parallelized workloads that can tolerate higher latencies than the General Purpose mode
        • recommended for large-scale workloads
        • scale to higher levels of aggregate throughput and operations per second
        • not supported in One-Zone File System Type
  • Throughput Mode
    • Has 2 modes:
      1. Bursting
        • default throughout mode
        • scale based on storage size
        • baseline 1TiB = 50 MiB/sec
      2. Provisioned
        • provision a fixed throughput regardless of the size of the file system
      3. Elastic
        •  spiky or unpredictable workloads and performance requirements that are difficult to forecast, or
        •  your application drives throughput at an average-to-peak ratio of 5% or less.
  • Storage Types
    1. Standard
      • default storage
    2. Infrequently Accessed
      • lifecycle management will move data to this storage after N days

SNOW FAMILY

  • A physical device used to transfer a large amount of data to AWS or for Edge computing (e.g. at the ship, at mining location)
  • Can run EC2 instances or lambda functions.
  • Supports 80 TiB block or S3-compatible storage
  • 3 Types of Devices:
    1. Snowcone (Discontinued)
      • 8 Tib of storage storage
      • Can connect to the network
      • Data sync agent installed.
      • 2 CPU 4GiB RAM
    2. Snowball Edge
      • Can do clustering – i.e. three or more Snowball Edge devices used as a single logical unit for local storage and compute purposes.
      • It may take up to 4 weeks to provision and prepare the device for your job before it is shipped.
      • Has 2 flavours:
        1. Storage Optimize
          • Has 210 TiB of NVME storage
          • 104 vCPU 416GiB of RAM
        2. Compute Optimize
          • Has 28 TiB storage
          • 104 vCPU 416 GiB of NVME SSD RAM
          •  (with AMD EPYC Gen2)
          • Optional GPU
    3. Snowmobile
      • Can hold exabytes of data
      • migrate large datasets of 10PB or more in a single location
  • Use OpsHub (software installed on a machine to manage Snowcone or Snowball Edge.

FSX

  • A high-performance file system
  • Has 3 offerings:
    • FsX for Lustre
    • FsX for Windows File Server
    • FsX for NetApp ONTAP
  • FsX for Windows File Server
    • Fully managed Windows File System for sharing (like EFS for Windows)
    • Supports SMB and NTFS
    • Millions of IOS
    • Multi-AZ
    • Can be accessed from on-prem
    • Backup to S3
    • Supports file access auditing.
  • FsX for Lustre
    • Lustre s a type of parallel distributed file system, generally used for large-scale cluster computing (src Wikipedia)
    • For HPC
    • Millions of IOPS
    • Can integrate with S3 (expose S3 as a file system)
    • Can be accessed from on-prem
    • Support POSIX protocol
  • Deployment Options:
    • Scratch File System – temporary storage, single copy
    • Persistent File – long term storage, data replicated within the same AZ

STORAGE GATEWAY

  • A hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.
  • Exposes S3, FsX for Windows, EBS Snapshot & Glacier from on-prem.
  • Requires an agent running on a VM or an appliance ordered from AWS.
  • A client can connect either through a public network or AWS Direct.
  • 4 Gateway Types:
    1. S3 File Gateway
      • Presents a file interface that enables you to store files as objects in Amazon S3 
      • Backed by S3
      • Can integrate with AD
      • The client talks to the agent using NFS or SMB protocol
      • Most recently used data will be cached by the agent.
    2. FsX File Gateway
      • On-premises access to Windows file shares on Amazon FSx
      • Can cache frequently accessed data (unlike accessing FsX directly)
      • Reading and writing files, are all performed against the local cache, while Amazon FSx File Gateway synchronizes changed data to FSx for Windows File Server in the background
    3. Volume Gateway
      • Presents your applications block storage volumes using the iSCSI protocol (like a disk)
      • Backed by EBS and S3 snapshot
      • 2 Types of Volumes:
        1. Cacheddata is written to S3, while retaining your frequently accessed data locally in a cache for low-latency access. Maximum of 1 PB per gateway.
        2. Stored – primary data is stored locally and your entire dataset is available for low-latency access while asynchronously backed up to AWS. Maximum of 512TB per gateway
    4. Tape Gateway
      •  Cloud-based Virtual Tape Library (VTL)
      • Works with leading backup software
      • Backed by S3, Glacier and Glacier Deep Archive

TRANSFER FAMILY

  • A fully managed FTP service
  • Support the following protocols:
    • SFTP
    • FTPS
    • FTP
  • HA, multi-AZ
  • User access the FTP endpoint directly or through Route 53
  • Can authenticate with LDAP, AD, Cognito

Leave a Comment

Your email address will not be published. Required fields are marked *