AWS Certified Data Engineer Associate (DEA-C01) Review Material – Application Discovery and Data Migration Service

Application Discovery Service

  • Collects usage and configuration data about your on-premises servers and databases. 
  • All discovered data is stored in your AWS Migration Hub home Region. 
  • Application Discovery Service APIs help you export the system performance and utilization data for your discovered servers and network connections between servers.
  • Integrated with AWS Migration Hub and AWS Database Migration Service Fleet Advisor. 
  • Two (2) ways of performing discovery and collecting data about your on-premises servers:
    1. Agentless discovery (Agentless Collector)
      • Works for VMWare installation only.
      • Deploy the Application Discovery Service Agentless Collector (Agentless Collector) (OVA file) through your VMware vCenter. 
      • It identifies virtual machines (VMs) and hosts associated with the VMware vCenter.
      • It does not support physical servers
    2. Agent-based discovery (Discovery Agent)
      • Deploys the AWS Application Discovery Agent on each of the VMs and physical servers.
      • The agent is available for Windows and Linux operating systems.
      • Cannot collect:
        • Database configuration data
        • VM utilization metrics
        • Database utilization metrics

Data Migration Service

  • Migrate relational databases, data warehouses, NoSQL databases, and other types of data stores
  • Migrate your data into the AWS Cloud or between combinations of cloud and on-premises setups.
  • Support homogenous (same DB engine) and heterogeneous (different DB engine) migration.
  • Support copying data to non-DBs like:
    • KDS
    • S3
    • Kafka
  • Has the ability to discover your source data stores and convert source schemas.
  • Support continuous data replication
    • by capturing ongoing changes after initial (full-load) migration.
    • This process is also called change data capture (CDC).
  • At a basic level, AWS DMS is a server in the AWS Cloud that runs replication software.
    • In short, you need an EC2 instance that runs the replication in the same region (and preferably on the same VPC)
  • AWS DMS creates the tables and associated primary keys if they don’t exist on the target. You can create the target tables yourself if you prefer.
  • Use DMS Fleet Advisor to collect metadata and performance metrics from multiple database environments.
    • It collects metadata and metrics from your on-premises database and analytic servers from one or more central locations without the need to install it on every computer. 
  • You can also use the AWS Schema Conversion Tool (AWS SCT) to create some or all of the target tables, indexes, views, triggers, and so on.
    • It is a standalone application that provides a project-based user interface
    • Must be installed on a machine that has access to the data source
  • AWS DMS does not migrate empty tables. As a workaround, dummy data can be fed into the empty tables before the migration task to help migrate all the tables.

Hands-On

DMS Fleet Advisor

  • We will use VirtualBox VMs to simulate our on-prem infrastructure for this hands-on.
  • We must create two (2) VMs running on the same network (NAT Network). One of the VMs will run the Data Collector, and the other will run the OS and the database we will discover.
    • Install Windows Server 2016 on VM no. 1. We need a Windows Server because the Data Collector only runs on Windows Server 2012 or higher. You can follow this link to download an evaluation copy of the Windows Server.
      • Configure your Windows Server as a DC and install ADDS. You will need to provide an LDAP server in your Data Collector, and since Windows Server comes with an LDAP server (ADDS), we can also use it as our LDAP server.
      • Install the Data Collector.
        • Create a Data Collector in AWS (Before this, you must have your IAM Roles and S3 bucket ready. Refer to this link for details).
        • Download the local Data Collector from the link provided in the dashboard.
        • Run the installation program in the Windows Server (Note: You need to install .Net 4.8 before you can run the installer).
    • Install a Linux OS on VM no. 2. This VM will host our database and be scanned by the Data Collector. In this hands-on, I chose Ubuntu, but you can choose any flavor of Linux or use Windows OS.
      • Install the MySQL server. This link will help you with the installation. After installation, you need to configure your MySQL configuration to allow connections from outside.
        • .
      • Install SSH. The Data Collector uses SSH to connect to your VM to collect data about your OS.
      • Create a test database/table and test data in the MySQL database. I will create the same test database/table and use the same test data I used in this hands-on.
      • Configure your VM to join the domain of VM no. 1. This link provides instructions on how to make an Ubuntu OS join a Windows domain.
  • Configure the Data forwarding and LDAP servers in the Data Collector.
  • Run the discovery.
  • Add the discovered OS and Database to the Monitored Objects:
  • Run the Data Collection:
  • After a couple of hours, the data collection will be complete:
  • You can then start to analyze the inventory:
    • To analyze your inventory, click ‘Analyze Inventories’
    • You can also generate recommendations:

Schema Conversion Tool (SCT)

  • In this hand-on we will convert the MySQL schema in our VM to PostgreSQL schema.
  • We will continue to use one of VMs we used in this hands-on, but we need to make changes in our VirtualBox Nat Network setting. The SCT instance in AWS must connect to our MySQL database on our Linux machine. So, we need to create a Port Forwarding rule that would forward traffic from our host to the VirtualBox VM hosting our MySQL database. We must also ensure our host’s firewall does not block our chosen port. In this hands-on, I used host port 53306.
  • Create an AWS RDS PostgreSQL database. We will use AWS RDS PostgreSQL as our target database and convert our MySQL schema to a PostgreSQL schema.
  • From the AWS DMS console, create two (2) Data Providers.
    • The first Data Provider is the source, the MySQL database. In your Data Provider configuration, use the VirtualBox host IP and the host port in your Port Forwarding Rules as your server and port.
    • The 2nd Data Provider is the AWS RDS PostgreSQL instance.
  • Create an Instance Profile:
    • Select a VPC where the SCT instance will run and create a Subnet Group. To make it simple, use the Default VPC because this already have a public subnet.
    • Create an Instance Profile with a Public IP.
  • Create a Migration Project:
    • Create users in your source and target databases. This link and this link describe the roles and privileges your source and target users will need. Afterwards, create two (2) secrets in the AWS Secret Manager to store the usernames and passwords of these database users.
    • Create an S3 bucket that DMS uses to store schema conversion metadata.
    • Create the necessary IAM Roles. You need IAM Roles to access the secrets in the Secret Manager and store data in S3.
    • Create a Migration Project.
      • For this hands-on, we will prefix the table name in the target database with ‘pg’. We will define this in the Migration Project’s Transformation Rule.
  • Perform Schema Conversion:
    • From the Schema Conversion tab of the Migration Project, launch the schema conversion.
    • Under the Source Schema window pane of the Schema Conversion dashboard, convert the source schema.
      • When the process completes, you should see a new database/table under the Target Schema windows pane. The table name should be prefixed with ‘pg’.
      • Note that these are not yet effective in your target database. You need to apply the changes.
    • Apply the changes to your target database.
      • Right-click on the target schema and click ‘Apply’.
      • Once the process is complete, verify that the database and tables have been created successfully in your target database.

Migrate Data

  • In this hands-on, we will migrate the data from our VirtualBox MySQL database to an AWS RDS PostgreSQL database.
  • We will continue to use the same VM we used in this hands-on.
  • Create two (2) Endpoints:
    • Create users in your source and target databases. This link and this link describe the roles and privileges your source and target users will need. Afterwards, create two (2) secrets in the AWS Secret Manager to store the usernames and passwords of these database users.
    • Create the necessary IAM Roles. You need IAM Roles to access the secrets in the Secret Manager.
    • Create the source endpoint and then test the connection.
    • Create the target endpoint and then test the connection.
  • Create a Replication Instance:
    • Choose dms.t2.micro for your instance class since this will be eligible for Free Tier.
    • Assign a Public IP to your instance to connect to your MySQL database.
    • For the VPC, I suggest using the Default VPC so you can reuse the subnet group you created in the previous hands-on.
  • Create a Database Migration Task:
    • Use the Replication Instance and Endpoints you created previously.
    • You can choose ‘Migrate’ or ‘Migrate and replicate’ for the Migration type.
    • If you want to try CDC, enable binary logging in your MySQL database.
    • Under the Mapping rules, create a Transformation rule that would add a prefix of ‘pg’ to the target table name.
    • If you leave the default setting of the ‘Migration task startup configuration’, the migration should start immediately once the task is created.
    • You can also run a Premigration Assessment to see if everything is configured correctly.
    • Once the migration is complete, you can look at the Table statistics tab to see how many records have been migrated.
    • Validate the result by counting the records in the AWS RDS PostgreSQL database.

Leave a Comment

Your email address will not be published. Required fields are marked *