AWS Solution Architect Associate (SAA-C02) Review Material – S3

General

  • Files are stored in containers called ‘buckets.’
  • A bucket is regional, but the name must be globally unique (universal namespace).
  • A bucket name:
    • can only be lower case letters, numbers. dots(.) and hyphens (-)
    • must begin with a letter or number
    • cannot be an IP address
  • Unlimited storage.
  • Files are stored in buckets as objects.
  • Objects are identified by their key.
  • No concept of directories in S3 although a key name may contain prefix which appears as directories or path.
  • Object size can be from 0 – 5TB
  • As of Dec 2020, all operations are strongly consistent.

Versioning

  • Versioning must be enabled (on a bucket level). Not set on default.
  • Existing objects that are not versioned will get version id = ‘null’ once versioning is enabled (i.e. they have no version).
  • When the latest version is deleted, the object is not deleted but hidden by placing a delete marker in the object.
  • When an older version is deleted it is deleted permanently.
  • Versioning can be enabled or disabled anytime. When disabled the versions are not deleted.

Encryption

  • Object encryption can be done on the server-side or client-side.
  • You encrypt the object, not the bucket although you can set default encryption for the bucket.
  • The default server-side encryption is enabled at the bucket level. When enabled all uploaded objects will be encrypted by S3 (server side) using the default key.
  • S3 exposes both HTTP and HTTPS REST API endpoints.
  • Server-side encryption keys can be:
    • SS3-SE
      • a key that is managed by S3
      • encrypts each object with a unique key
      • must pass the header “x-amz-server-side-encryption”:”AES256″ in the REST API call
      • key is rotated every 3 years
    • SS3-KMS
      • a key stored and managed in KMS (either AWS or CMK)
      • must pass the header “x-amz-server-side-encryption”:”aws:kms” in the REST API call
    • SS3-C
      • must use HTTPS
      • a key stored and managed by the client
      • must pass provide the algorithm ( x-amz-server-side​-encryption​-customer-algorithm), encryption key( x-amz-server-side​-encryption​-customer-key) and md5 digest(x-amz-server-side​-encryption​-customer-key-MD5) in the REST API call
    •  DSSE-KMS
      • dual-layer server-side encryption
      • applies two layers of encryption to objects when they are uploaded
      • AWS KMS keys must be in the same Region as the bucket
  • Client-side encryption
    • client stores and managed the encryption keys as well the actual encryption itself.
    • no server-side encryption is performed.

Storage Class

  1. Standard
  2. Standard-IA
  3. Intelligent Tiering
    • 3 Access Tiers:
      1. Frequent
      2. Infrequent
      3. Archive Instant Access
    • Opt-ins
      1. Archive Access
      2. Deep Archive Access
  4. One Zone-IA
  5. Glacier Instant Retrieval
  6. Glacier Flexible Retrieval
  7. Glacier Deep Archive
  8. Reduced Redundancy (not recommended)
  • Glacier is different from S3. It uses containers called Vaults instead of ‘buckets’. It also has other resources such as Archive, Jobs, and Notification Configuration. The Glacier storage classes present Glacier as another form of storage, but underneath it, it still uses Glacier.
Storage classDesigned forDurability (designed for)Availability (designed for)Availability ZonesMin storage durationMin billable object sizeOther considerations
StandardFrequently accessed data (more than once a month) with millisecond access11 9s4 9s>=3NoneNoneNone
Standard -IALong-lived, infrequently accessed data (once a month) with millisecond access11 9s3 9s>=330 days128KBPer GB retrieval fee
Intelligent TieringData with unknown, changing, or unpredictable access patterns11 9s3 9s>=3NoneNoneMonitoring and automation fees per object apply. No retrieval fees.
One Zone-IAFrequently accessed data (more than once a month) with millisecond access11 9s99.5%130 days128KBPer GB retrieval fees apply. Not resilient to the loss of the Availability Zone.
Glacier Instant RetrievalLong-lived, archive data accessed once a quarter with millisecond access11 9s99.99% (after you restore objects)>=390 days128KBPer GB retrieval fees apply.
Glacier Flexible RetrievalLong-lived archive data accessed once a year with retrieval times of minutes to hours11 9s99.99% (after you restore objects)>=390 days40KBPer GB retrieval fees apply. You must first restore archived objects before you can access them
Glacier Deep ArchiveLong-lived archive data accessed less than once a year with retrieval times of hours11 9s99.99% (after you restore objects)>=3180 days40KBPer GB retrieval fees apply. You must first restore archived objects before you can access them. 

Lifecycle Management

  • Has rules that define actions to a set of objects. These actions include:
    • move object between storage classes
    • permanently delete non-current objects
    • expires object
    • delete expired object delete marker
    • delete incomplete multi-part uploads
  • Order of transition. Lower order cannot transition to a higher order.
    1. Standard
    2. Standard-IA
    3. Intelligent Tiering
    4. One Zone-IA. Cannot transition to Glacier Instant Retrieval
    5. Glacier Instant Retrieval
    6. Glacier Flexible Retrieval
    7. Glacier Deep Archive
  • For the number of days required refer to the Min storage duration in the above table.
    • The rule for Standard is that objects must be >= 30 days old to transition to Standard-IA or One Zone-IA.
    • Example 1: Standard->Standard-IA – Minimum Days after object creation
      = 30
      days because of rule above
    • Example 2: Standard->Standard-IA[->One Zone-IA – One Zone-IA Minimum Days after object creation = 60 days because 30 days in Standard + 30 days in Standard-IA
    • Example 3: Standard->Glacier Deep Archive – Minimum Days after object creation = 0 days because above rule does not apply
    • Example 3: Standard->Glacier Instant Retrieval->Glacier Deep Archive – Glacier Deep Archive Minimum Days after object creation = 100 days[assuming Glacier Instant Retrieval is set to 10] because the object must be in Glacier Instant Retrieval for 90 days.

Bucket Security and Access

  • Ways to allow or deny access to S3:
    • IAM policy
    • Bucket policy
    • Bucket ACL
    • Object ACL
  • Difference between IAM policy and S3 (bucket) policy in S3 context:
    • IAM policy is attached to a role, user, or group while S3 policy is attached to a bucket.
    • S3 policy answers the question “who (user or role)” can access my resources; IAM policy answers the question “what” s3 and s3 resources can I access.
    • IAM policy does not require Entity while S3 policy does.
  • Object ownership can be owned by other AWS accounts by disabling the bucker owner enforced (disable bucket ACL)
  • The ultimate authorization is the least-privilege union of all policies and ACLs.
  • ACL control access not only on the bucket level but on an object level as well.

Static Website Hosting

  • S3 can host static content by enabling Static Website Hosting.
  • Objects must be publicly accessible. Disable Block Public Access and provide the right Bucket Policy.
  • URL is either:
    • <bucket-name>.s3-website.<aws region>.amazonaws.com
    • <bucket-name>.s3-website<aws region>.amazonaws.com
  • Can specify the
    • Index Document e.g. index.html
    • Error Document e.g. error.html
    • Redirection
  • If the bucket will be a cross-origin bucket (i.e. original request is from a different domain) then the CORS must be configured in the cross-origin bucket (not in the origin bucket)
  • CORS configuration basically adds ACCESS-CONTROL-ALLOW-* headers in the response, which would allow the browser to load the page
  • Bucket names must match your domain name in Route 53 exactly.

Replication

  • Bucket objects can be replicated to another bucket in the same (SRR – Same-Region Replication) or different region (CRR) – Cross-Region Replication) on the same or different account.
  • Versioning must be enabled on both the source and destination.
  • Objects that existed before replication will not be replicated.
  • The source bucket must have the right IAM role and specify that role in your replication configuration.
  • Objects can be encrypted on the destination bucket.
  • Permanent object deletion is not replicated. However, the delete marker (latest version) can be replicated.
  • Object tags can be replicated across AWS Regions using Cross-Region Replication

S3 Performance

  • Object Prefix:
    • The object prefix is the characters before the object name .e.g. for an object with a key ‘/path1/path2/myobject.jpg,’ the prefix is /path1/path2.
    • A prefix can impact performance. Prefix is used to calculate (hash) the object location.
    • Each prefix can achieve 3,500 PUT/POST/DELETE/COPY and 5,500 GET/HEAD requests/second.
    • Spreading objects across multiple prefixes can improve performance. So placing 2 objects on the same prefix allows you to perform 5,500 GET requests/sec, whereas placing them on 2 different prefixes will double that.
  • Multi-part Upload:
    • Allows for a single file to be uploaded as a set of parts.
    • Upload can be done in parallel.
    • Must be used if > 5GB
    • Recommended if > 100MB
  • SS3-KMS
    • There’s a quota on the number of requests S3 can make on KMS for encrypted objects.
    • The quota is per region.
    • The quota cannot be increased.
    • Downloads and uploads count toward the quota.
  • Byte-Range Fetch
    • Sort of the opposite of Multi-part Upload.
    • Allows for parts of a file to be fetched by specifying the start end bytes in the request GetObject request header.
    • Good practice to use GET the same size when the object was PUT
  • S3 Select
    • Retrieves a subset of data using a simple SQL query.
    • Works on objects stored in CSV, JSON, or Apache Parquet format
    • Reduce network transfer by returning only the data needed.
    • Works for both S3 and Glacier (but not on Glacier Deep Archive)
    • S3 Select vs Athena:
      • Only works on 1 object; Athena can work on multiple objects.
      • Only perform simple queries; Athena can perform complex queries e.g. group by, having.
      • Intended to reduce data transfer; Athena is for analytics.
  • Transfer Acceleration
    • Usually used when the source is in a different region from the destination bucket.
    • Uses CloudFront edge infrastructure to transfer data from locations closest to the source into the destination S3 buckets.
    • Makes use of AWS private network.
    • Uses a distinct URL to transfer data (<object-name>.s3-accelerate.amazonaws.com)
    • You only pay for a transfer that is accelerated

Object Protection

  • MFA Delete
    • Can only be enabled from CLI.
    • Versioning must be enabled
    • Requires 2 forms of authentication:
      1. security credentials
      2. MFA code
    • The following actions require MFA:
      • Permanently delete
      • Disable versioning
  • Pre-signed URL
    • A URL that is encoded with a security credential that expires and has limited permission.
    • URL can be generated using AWS SDK, console, REST API or through Console.
    • Credentials that can be used to create Pre-signed URLs:
      1. IAM instance profile (EC2)
      2. IAM user
      3. STS token service
  • OAI
    • Prevents direct access to objects using the S3 URL
    • Used in conjunction with CloudFront, i.e. objects can only be accessed via CloudFront.
    • Implemented via Bucket Policy. CloudFront can update the S3 Bucket Policy when enabled through CloudFront.
  • S3 Object Lock
    • Stores objects in a write-once-read-many (WORM) model.
    • Can only be enabled during bucket creation.
    • Versioning must be enabled.
    • To protect an object you must:
      • set its Retention Period – the period of time where it can’t be overwritten or deleted, and/or
      • set a Legal Hold – prevents objects from being overwritten or deleted until it is removed.
    • 2 types of Lock Modes:
      • Governance
        • Ordinary users cannot overwrite, delete or alter the lock setting of protected objects.
        • Users with special permission can overwrite, delete or alter the lock setting of protected objects.
      • Compliance
        • No user (including root) can overwrite, delete or alter the lock settings of protected objects.
  • Glacier Vault Lock
    • Enforces compliance through lock policy (e.g. WORM) then lock the policy from future changes.

Leave a Comment

Your email address will not be published. Required fields are marked *