AWS Certified Data Engineer Associate (DEA-C01) Review Material – Data Sync

Data Sync

  • An online data transfer and discovery service that simplifies data migration
  • Transfer data to, from, and between AWS storage services.
  • Source/Destinations:
    1. On-Prem
      • NFS
      • SMB
      • HDFS
      • Object Storage
    2. AWS
      • S3
      • EFS
      • Fsx (Windows File Server, Lustre, Ontap, Open ZFS)
    3. Other Cloud Storage
      • GCP
      • Azure
      • Oracle
      • Alibaba
      • etc
  • Most situations that require a DataSync agent involve storage that you or another cloud provider manages, such as on-prem or other cloud providers.
  • Replication is scheduled. Not automatic.
  • File permission and metadata are preserved.
  • If bandwidth is limited, you can use AWS Snowcone to synchronise data. Snowcone comes with a pre-installed Data Sync agent.

Hands-On

Note: In this hands-on, we will run the agent on our local machine using Oracle Virtual Box.

  1. Download the Agent. Links will be provided when you create an agent. Select the KVM Hypervisor.
  2. The agent will be in QCOW format. We must convert the file into VDI to run it in Oracle Virtual Box.
    • $ qemu-img convert -O vdi aws-datasync-2.0.1727187542.1-x86_64.xfs.gpt.qcow2 aws-datasync.vdi
  3. Create a new VM in the Oracle Virtual Box using the VDI.
  4. Start the Agent VM
  5. Perform the Network Connectivity Test (Option 2) to ensure your agent can connect to AWS.
  6. Continue with the agent creation (Step 1). Under the option Activation Key, choose Manually enter your agent’s activation key.
  7. The activation key can be obtained from your Agent VM. To display it, choose option ‘0’.
  8. Complete the Agent creation.
  9. Spin another VM in your Virtual Box with an NFS server running. Ensure your Agent’s and NFS servers communicate on the same network. You can test this by choosing Option 3(Test Connectivity to a Self-Managed Storage) in the Agent’s VM, then Option 1 (NFS Server). In my test environment, the IP address of the NFS server is 10.0.2.6.
  10. In this hands-on activity, the NFS server shares the directory /scratch.
  11. Create two(2) locations. One location is the NFS server, which is the data source. The other is the S3 bucket, which is the data destination.
  12. Create a task using the NFS location as the source, the S3 as the destination and the agent as the agent you created in Step 1.
  13. Create a file under the /scratch folder of your NFS Server
  14. Start your task. After a few minutes, you will see the file you created in the NFS server copied to the S3 bucket.

Leave a Comment

Your email address will not be published. Required fields are marked *