{"id":942,"date":"2024-10-17T09:27:32","date_gmt":"2024-10-17T09:27:32","guid":{"rendered":"https:\/\/192.168.1.3\/wordpress\/?p=942"},"modified":"2024-12-16T09:24:43","modified_gmt":"2024-12-16T09:24:43","slug":"aws-certified-data-engineer-associate-dea-c01-review-material-redshift","status":"publish","type":"post","link":"https:\/\/mylinuxsite.com\/wordpress\/?p=942","title":{"rendered":"AWS Certified Data Engineer Associate (DEA-C01) Review Material \u2013 Redshift"},"content":{"rendered":"\n<!--more continue reading-->\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Overview<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>A fully managed, petabyte-scale data warehouse service in the cloud.<\/li><li>Use <strong>MPP<\/strong> (Massively Parallel Processing) execution to achieve high performance.<ul><li>Multiple compute nodes handle all query processing leading up to final result aggregation, with each core of each node running the same compiled query segments on portions of the entire data.<\/li><li>Redshift distributes the rows of a table to the compute nodes so that the data can be processed in parallel.&nbsp;<\/li><\/ul><\/li><li>Designed for OLAP.<\/li><li><strong>Columnar storage<\/strong> <ul><li>Each data block stores values of a single column for multiple rows. As records enter the system, Amazon Redshift transparently converts the data to columnar storage for each of the columns.<\/li><\/ul><\/li><li>Uses SQL to query data.<\/li><li>Scale up and down on demand.<\/li><li>Uses JDBC\/ODBC or Data API to connect to the database programmatically.<\/li><li>It has built-in Backup and Replication<\/li><li>2 Types of Redshift:<ul><li><strong>Provisioned<\/strong><ul><li>A collection of compute nodes and a leader node, which you <em>manage directly.<\/em><\/li><\/ul><\/li><li><strong>Serverless<\/strong><ul><li>Automatically provisions and manages capacity for you.<\/li><li>Doesn&#8217;t have the concept of a cluster or node.<\/li><li>Uses <strong>Namespace<\/strong> and <strong>Workload<\/strong> to  isolate workloads and manage different resources<\/li><li>Automatically manages resources efficiently and scales, based on workloads, within the thresholds of cost controls.<\/li><\/ul><\/li><\/ul><\/li><li>Anti-Patterns:<ul><li>OLTP<\/li><li>Small Data<\/li><li>Unstructured Data<\/li><li>Blob<\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Common Use Case<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\"><li>Data Analytics<\/li><li>Unified Data Warehouse or Data Lake<\/li><li>Stock Analysis<\/li><li>Social Trend Analysis<\/li><\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Architecture (Provisioned)<\/strong><\/h3>\n\n\n\n<p><a href=\"https:\/\/docs.aws.amazon.com\/images\/redshift\/latest\/dg\/images\/architecture.png\" data-type=\"URL\" data-id=\"https:\/\/docs.aws.amazon.com\/images\/redshift\/latest\/dg\/images\/architecture.png\">https:\/\/docs.aws.amazon.com\/images\/redshift\/latest\/dg\/images\/architecture.png<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Components<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Clients<\/strong>:<ul><li>Integrates with various data loading and ETL and reporting, data mining, and analytics tools.<\/li><li>Uses JDBC\/ODBC or Data API for connection<\/li><li>Based on open standard PostgreSQL, most existing SQL client applications will work with minimal changes.<\/li><li>Only interacts with the leader node.<\/li><\/ul><\/li><li><strong>Cluster<\/strong>:<ul><li>A cluster is composed of one or more&nbsp;<em>compute nodes<\/em>. <\/li><li>A cluster can be created in two ways:<ol><li>Single Node: The node functions as both leader and compute node<\/li><li>Mult-Node: An additional&nbsp;leader node&nbsp;will be added to coordinate the compute nodes and handle external communication.&nbsp;<\/li><\/ol><\/li><\/ul><\/li><li><strong>Leader Node:<\/strong><ul><li>Manages communications with client programs and all communication with compute nodes.<\/li><li>Based on the execution plan, the leader node compiles code, distributes the compiled code to the compute nodes, and assigns a portion of the data to each compute node.<\/li><li>The leader node distributes SQL statements to the compute nodes only when a query references tables stored on them.&nbsp;<\/li><\/ul><\/li><li><strong>Compute Node:<\/strong><ul><li>Run the compiled code and send intermediate results back to the leader node for final aggregation.<\/li><li>Each compute node has its dedicated CPU and memory, determined by the node type.<\/li><li><strong>Node Slices:<\/strong><ul><li>A compute node is&nbsp;partitioned into slices.<\/li><li>Each slice is allocated a portion of the node&#8217;s memory and disk space, processing a portion of the workload assigned to the node.&nbsp;<\/li><li>The node size of the cluster determines the number of slices per node.<\/li><li>Rows are distributed to the node slices according to the distribution key that is defined for a table<\/li><\/ul><\/li><li>Node Type:<ol><li>Dense Compute (DC): Data is stored in the node<\/li><li>RA3: Data is stored in Amazon S3, and the local drive is used as a data cache.<\/li><\/ol><\/li><\/ul><ul><li>A cluster contains one or more databases. <\/li><li>User data is stored on the compute nodes.<\/li><\/ul><\/li><li><strong>Redshift Managed Storage<\/strong>:<ul><li>Only available on &nbsp;RA3 Node.<\/li><li>A separate storage layer apart from the node storage that stores data warehouse data.<\/li><li>Scale your storage to petabytes using Amazon S3 storage.<\/li><li>Lets you scale and pay for computing and storage independently.<\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Durability<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Data is stored in 3 locations within the cluster. <ol><li>On the original node<\/li><li>On a replica of compute nodes<\/li><li>On S3 as a snapshot<\/li><\/ol><\/li><li>Snapshots can be taken automatically or manually. They are point-in-time backups of a cluster stored internally in S3.<ul><li>Automated snapshots are deleted at the end of a retention period.<\/li><li>Manual snapshots are retained indefinitely, even after you delete your cluster.<\/li><li>Can&nbsp;<em><span style=\"color:#a30900\" class=\"has-inline-color\">asynchronously<\/span><\/em>&nbsp;replicate the snapshots to&nbsp;<em>S3 in another region<\/em>&nbsp;for DR.<ul><li>Steps to create a cross-region snapshot (<span style=\"color:#a30000\" class=\"has-inline-color\">only on the provisioned cluster<\/span>):<ol><li>Create a KMS key in the destination region.<\/li><li><span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\">Create a&nbsp;<em>snapshot copy<\/em>&nbsp;<em>grant<\/em>&nbsp;in the destination region specifying the KMS key created <\/span>in Step 1.<\/li><li>In the source AWS Region, enable copying of snapshots and specify the name of the <em><span style=\"color:#cc4909\" class=\"has-inline-color\">snapshot copy grant<\/span><\/em> that you created in the destination AWS Region.<\/li><\/ol><\/li><\/ul><\/li><\/ul><\/li><li>Amazon Redshift will automatically detect and replace a failed node in your data warehouse cluster.<ul><li>The replacement node is made available immediately and loads your most frequently accessed data from Amazon S3 first, allowing you to resume querying your data as quickly as possible.<\/li><li><em>Single-node clusters<\/em> do not support data replication. In the event of a drive failure, you must restore the cluster from a snapshot on S3.<\/li><\/ul><\/li><li><strong>Single-AZ deployment:<\/strong><ul><li>If the cluster&#8217;s Availability Zone becomes unavailable, <strong>Amazon Redshift will automatically move it to another AWS Availability Zone (AZ)<\/strong> without any data loss or application changes. You must enable the relocation capability in your cluster configuration settings to activate this.<\/li><\/ul><\/li><li><strong>Multi-AZ deployment:<\/strong><ul><li>Only for<strong> <\/strong><em><span style=\"color:#a32c00\" class=\"has-inline-color\">provisioned RA3<\/span><\/em> clusters.<\/li><li>It deploys compute resources in two Availability Zones (AZs), which can be accessed through a single endpoint.<\/li><li>You need to create a mirror and then self-manage replication and failover.&nbsp;<\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Scaling<\/strong> <strong>(Provisioned)<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Automatic<\/strong>:<ul><li><strong>Concurrency Scaling <\/strong><ul><li>Vertical scaling<\/li><\/ul><ul><li>It is enabled through WLM.<\/li><li>Automatically adds cluster capacity when your cluster experiences an increase in query queueing.<\/li><\/ul><\/li><\/ul><\/li><li><strong>Manual<\/strong>:<ul><li><strong>Elastic Resize<\/strong><ul><li>Horizontal and Vertical  scaling<\/li><li>Add nodes to or remove nodes from your cluster <em>of the same type.<\/em><\/li><li>Change the node type, such as from DC2 nodes to RA3 nodes (<em>this will drop connection<\/em>)<\/li><li>Completes quickly, taking ten minutes on average.&nbsp;<\/li><\/ul><\/li><li><strong>Classic Resize<\/strong><ul><li>It can change the node type, number of nodes, or both, similar to elastic resize.<\/li><li>It takes more time to complete.<\/li><li>H<strong>andles use cases where the change in cluster size or node type isn&#8217;t supported by elastic resize<\/strong><\/li><\/ul><\/li><li>Resize Steps:<ol><li>A snapshot is created. <\/li><li>A new target cluster is provisioned with the latest data from the snapshot, and data is transferred to the new cluster in the background. During this period, data is read-only. <\/li><li>When the resize nears completion, Amazon Redshift updates the endpoint to point to the new cluster and all connections to the source cluster are dropped.<\/li><\/ol><\/li><\/ul><\/li><li><strong>Snapshot &amp; Restore:<\/strong><ul><li>Near zero downtime.<\/li><li>A new cluster is provisioned. The old cluster is still running.<\/li><li>Use the snapshot for the old cluster to restore to the new cluster.<\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Work Load Management (WLM)<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Manage and prioritize concurrent queries and user workloads to optimize performance and resource utilization.<\/li><li>It allows you to define queues, user groups, and other constructs to control the resources allocated to different types of queries or users.<\/li><li>It can be configured to run either automatic or manual.<\/li><li><strong>Automatic WLM:<\/strong><ul><li>Redshift manages how resources are divided to run concurrent queries with automatic WLM.<\/li><li>Determines how many queries run concurrently and how much memory is allocated to each dispatched query.<\/li><li>It&#8217;s possible to define query priorities for workloads in a queue.<\/li><li><em><span style=\"color:#a32300\" class=\"has-inline-color\">Can create up to eight (8) queues<\/span><\/em> with the service class. Each queue has a priority.&nbsp;<\/li><li>Concurrency is lower when large queries are submitted.<\/li><li>Concurrency is higher when lighter queries are submitted,<\/li><li>Automatic WLM is separate from short query acceleration (SQA)&nbsp;<\/li><\/ul><\/li><li><strong>Manual WLM:<\/strong><ul><li><em><span style=\"color:#a32300\" class=\"has-inline-color\">Not available on Redshift Serverless.<\/span><\/em><\/li><\/ul><ul><li>Manage system performance by creating separate queues for long-running queries and short-running queries.<\/li><li>Each queue is allocated a portion of the cluster&#8217;s available memory. <\/li><li>Each query queue contains a number of query slots. A queue&#8217;s memory is divided among the queue&#8217;s query slots.<\/li><li>By default, Amazon Redshift configures the following query queues:<ol><li><strong>One superuser queue<\/strong><ul><li>It is reserved for superusers only, and it can&#8217;t be configured.&nbsp;<\/li><li>Use this queue only when you need to run queries that affect the system or for <em>troubleshooting<\/em> purposes.<\/li><\/ul><\/li><li><strong>One default user queue<\/strong><ul><li>Configured to run five queries concurrently.<\/li><li>The default queue must be the last queue in the WLM configuration. <\/li><li>Any queries that are not routed to other queues run in the default queue.<\/li><\/ul><\/li><\/ol><\/li><li><em>You can <span style=\"color:#a31600\" class=\"has-inline-color\">add up to eight (8) queues <\/span>to the default queue<\/em>.<\/li><li>Queries in a queue run concurrently until they reach the&nbsp;WLM <em><span style=\"color:#e01e0c\" class=\"has-inline-color\">query slot count, or&nbsp;concurrency&nbsp;<\/span>l<\/em>evel,&nbsp;defined for that queue.<\/li><li>The maximum slot count across all user-defined queues is 50<\/li><\/ul><\/li><li><strong>Short Query Acceleration (SQA):<\/strong><ul><li>Prioritizes selected short-running queries ahead of longer-running queries.<\/li><li>Runs short-running queries in a dedicated space.<\/li><li>Only prioritizes queries that are short-running and are in a user-defined queue.<\/li><li><a href=\"https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/r_CREATE_TABLE_AS.html\">CREATE TABLE AS<\/a>&nbsp;(CTAS) statements and read-only queries, such as&nbsp;<a href=\"https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/r_SELECT_synopsis.html\">SELECT<\/a>&nbsp;statements, are eligible for SQA.<\/li><li>It uses a machine learning algorithm to analyze each eligible query and predict its execution time but can define how many seconds is &#8216;short&#8217;. <\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Table Optimization<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Column Compression<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Compression&nbsp;is a column-level operation that reduces the size of data when it is stored.&nbsp;<\/li><li>Compression conserves storage space and reduces the size of data read from storage, which reduces the amount of disk I\/O and, therefore, improves query performance.<\/li><li>ENCODE AUTO is the table default in which Amazon Redshift automatically manages compression encoding for all columns.&nbsp;<\/li><li>Avoid compressing the sort key columns<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Data Distribution<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>When you load data into a table, Amazon Redshift distributes the rows of the table to each of the node slices according to the table&#8217;s distribution style.<\/li><li><strong>Distribution Styles:<\/strong><ul><li><strong>AUTO<\/strong><ul><li>Redshift assigns an optimal distribution style based on the size of the table data.<\/li><li>Redshift automatically changes the distribution style as the size of the table data grows.<\/li><\/ul><\/li><li><strong>EVEN<\/strong><ul><li>Distributes the rows across the slices <strong>round-robin,<\/strong> regardless of the values in any particular column.&nbsp;<\/li><li>This style is appropriate when a table doesn&#8217;t participate in joins, OR there isn&#8217;t a clear choice between KEY distribution and ALL distribution.<\/li><\/ul><\/li><li><strong>KEY<\/strong><ul><li>The rows are distributed according to the values in one column(the Key).<\/li><li>Keys with matching <strong>hashed<\/strong> values are physically stored together on the same node slice.<\/li><\/ul><\/li><li><strong>ALL<\/strong><ul><li>A copy of the entire table is distributed to every node.<\/li><li>ALL distribution multiplies the storage required by the number of nodes in the cluster, so it takes much longer to load, update, or insert data into multiple tables.&nbsp;<\/li><li>ALL distribution is appropriate only for relatively slow-moving tables, that is, tables that are not updated frequently or extensively.&nbsp;<\/li><\/ul><\/li><\/ul><\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Sort Keys<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Can alternatively define one or more of its columns as&nbsp;<em>sort keys<\/em><\/li><li>When data is initially loaded into the empty table, the rows are stored on disk in sorted order.&nbsp;<\/li><li>Sorting enables efficient handling of range-restricted predicates.<\/li><li>Types of Sort Keys:<ol><li><span style=\"color:#f00707\" class=\"has-inline-color\">Compound<\/span>:<ul><li>Made up of all of the columns listed in the sort key definition, in the order they are listed.<\/li><li>It is most useful when a query&#8217;s filter applies conditions, such as filters and joins, that use a prefix of the sort keys.&nbsp;<\/li><li>Compound sort keys might speed up joins, GROUP BY and ORDER BY operations, and window functions that use PARTITION BY and ORDER BY.&nbsp;<\/li><\/ul><\/li><li><span style=\"color:#eb0a0a\" class=\"has-inline-color\">Interleaved<\/span>:<ul><li>It gives equal weight to each column, or subset of columns, in the sort key.<\/li><li>It improves performance for those queries that use different columns for filters.<\/li><\/ul><\/li><\/ol><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Loading Data <\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Copy Command<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Loads data from S3, EMR, DynamoDB and remote host to a table.<\/li><li>The COPY command can read from multiple data files or multiple data streams simultaneously.<\/li><li>Perform the load operation in parallel.<\/li><li>It can decrypt data and unzip files (gzip,izop,bzip2).<\/li><li><strong>Load from S3<\/strong>:<ul><li>Leverages MPP to read and load data in parallel from a file or multiple files in an Amazon S3 bucket.&nbsp;<\/li><li>You can use a <em>manifest<\/em> file, a JSON-formatted file, that lists the data files to be loaded.<\/li><li>Can load data files uploaded to Amazon S3 encrypted using server-side encryption, client-side encryption, or both.<\/li><\/ul><\/li><li><strong>Load from EMR<\/strong>:<ul><li>Load data in parallel from an Amazon EMR cluster configured to write text files to the cluster&#8217;s Hadoop Distributed File System (HDFS) as fixed-width, character-delimited, CSV, or JSON-formatted files.<\/li><\/ul><\/li><li><strong>Load from a Remote Host:<\/strong><ul><li>Load data in parallel from one or more remote hosts, such as Amazon EC2 instances or other computers.&nbsp;<\/li><li>It uses SSH to connect to the remote host and runs commands on the remote hosts to generate text output.<\/li><\/ul><\/li><li><strong>Load from DynamoDB:<\/strong><ul><li>Load a table with data from a single Amazon DynamoDB table.<\/li><\/ul><\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Zero-ETL<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>A fully managed solution that makes transactional and operational data available in Amazon Redshift from multiple operational and transactional sources.<\/li><li>You don&#8217;t need to maintain an extract, transform, and load (ETL) pipeline.&nbsp;<\/li><li>Sources are currently supported for zero-ETL integrations:<ol><li>Amazon Aurora MySQL<\/li><li>Amazon Aurora PostgreSQL<\/li><li>Amazon RDS for MySQL<\/li><li>Amazon DynamoDB<\/li><\/ol><\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Vacuuming tables<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Re-sorts rows and reclaims space in either a specified table or all tables in the current database.<\/li><li>Amazon Redshift automatically sorts data and runs VACUUM DELETE in the background. This lessens the need to run the VACUUM command.&nbsp;<\/li><li>Types of Vaccum command:<ol><li><span class=\"has-inline-color has-ast-global-color-1-color\">FULL<\/span><ul><li>Sorts the specified table and reclaims disk space occupied by rows that were marked for deletion by previous UPDATE and DELETE operations. VACUUM FULL is the default.<\/li><li>Doesn&#8217;t perform a reindex for <em>interleaved<\/em> tables.<\/li><\/ul><\/li><li>SORT ONLY<ul><li>Sorts the specified table without reclaiming space freed by deleted rows.<\/li><li>Useful when reclaiming disk space isn&#8217;t important but re-sorting new rows is important.<\/li><\/ul><\/li><li>DELETE ONLY<ul><li>Reclaims disk space occupied by rows that were marked for deletion by previous UPDATE and DELETE operations.<\/li><\/ul><\/li><li>REINDEX<ul><li>Analyze the distribution of the values in interleaved sort key columns, then perform a full VACUUM operation.<\/li><li>It takes significantly longer than VACUUM FULL because it makes an additional pass to analyze the interleaved sort keys.<\/li><\/ul><\/li><\/ol><\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Streaming<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Streaming ingestion provides low-latency, high-speed data ingestion from&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/aws.amazon.com\/\/kinesis\/data-streams\/\" target=\"_blank\">Amazon Kinesis Data Streams<\/a>&nbsp;or&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/aws.amazon.com\/\/msk\/\" target=\"_blank\">Amazon Managed Streaming for Apache Kafka<\/a>&nbsp;to an Amazon Redshift provisioned or Amazon Redshift Serverless database. <\/li><li>Data flows directly from a data-stream provider to an Amazon Redshift provisioned cluster or to an Amazon Redshift Serverless workgroup. There isn&#8217;t a temporary landing area, such as an Amazon S3 bucket<ul><li><strong>MSK<\/strong>:<\/li><\/ul><ul><li><strong>Kinesis<\/strong>:<\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Unloading Data<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Unload Command<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Use the&nbsp;<a href=\"https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/r_UNLOAD.html\">UNLOAD<\/a> with a SELECT statement to copy data from a database table to an S3 bucket.<\/li><li>Text data can be unloaded in delimited or fixed-width format, regardless of the data format used to load it. <\/li><li>It can also specify whether to create compressed GZIP files.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Redshift Data Sharing<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Share <span style=\"color:#f00707\" class=\"has-inline-color\">live<\/span> data across Amazon Redshift clusters or with other AWS services.<\/li><li>It lets you share live data without having to create a copy or move it.<\/li><li>You can share database objects for both reads and writes across different Amazon Redshift clusters or Amazon Redshift Serverless workgroups within the same AWS account, or from one AWS account to another.<\/li><li><strong>Use Cases:<\/strong><ol><li>Supporting different kinds of business-critical workloads&nbsp;<\/li><li>Enabling cross-group collaboration<\/li><li>Delivering data as a service<\/li><li>Sharing data between environments<\/li><li>Licensing access to data in Amazon Redshift&nbsp;(AWS Data Exchange)<\/li><\/ol><\/li><li><strong>Types of Data Sharing<\/strong>:<ol><li><strong>Standard<\/strong> &#8211; share data across provisioned clusters, serverless workgroups, Availability Zones, AWS accounts, and AWS Regions. You can share between cluster types as well as between provisioned clusters and Amazon Redshift Serverless.<\/li><li><strong>AWS Data Exchange<\/strong> &#8211; is a unit of licensing for sharing your data through AWS Data Exchange.<\/li><li><strong>AWS Lake Formation<\/strong> &#8211; share live data across AWS accounts and Amazon Redshift clusters through AWS Lake Formation-managed datashares.<\/li><\/ol><\/li><li>Must use RA3 nodes<\/li><li>The performance of the queries on shared data <strong>depends on the compute capacity<\/strong> of the <strong>consumer<\/strong> clusters.<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">DBLINK (PostgreSQL Only)<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>The DBLINK is an object created in the PostgreSQL instance.<\/li><li>PostgreSQL performs queries remotely through extensions:<ul><li>The 1st extension is called the <strong>foreign-data wrapper, postgres_fdw<\/strong>. &nbsp;<\/li><li>The 2nd extension is called dblink. &nbsp;The dblink function allows the entire query to be pushed to Amazon Redshift.&nbsp;<\/li><\/ul><\/li><li>Synthax:<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE EXTENSION postgres_fdw;\nCREATE EXTENSION dblink;\nCREATE SERVER foreign_server\n        FOREIGN DATA WRAPPER postgres_fdw\n        OPTIONS (host '&lt;amazon_redshift _ip&gt;', port '&lt;port&gt;', dbname '&lt;database_name&gt;', sslmode 'require');\nCREATE USER MAPPING FOR &lt;rds_postgresql_username&gt;\n        SERVER foreign_server\n        OPTIONS (user '&lt;amazon_redshift_username&gt;', password '&lt;password&gt;');<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Redshift Spectrum<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Efficiently query and retrieve structured and semistructured data from files in <strong>Amazon S3.<\/strong><\/li><li>There is no need to load the data into Amazon Redshift tables.<\/li><li>Employ massive parallelism to run very fast against large datasets.&nbsp;<\/li><li>Resides on dedicated Amazon Redshift servers that are independent of your cluster.<\/li><li>Support a variety of data formats (XML not included)<\/li><li>Support Gzip, Bzip2 and Snappy compression. <\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Data API<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Run SQL statements using the Data API operations with the AWS SDK.<\/li><li>Asynchronous.<\/li><li>Provides a secure HTTP endpoint.<\/li><li>Removes the need to manage database drivers, connections, network configurations, data buffering, credentials, and more.<\/li><li>Uses either credentials stored in AWS Secrets Manager or temporary database credentials.<\/li><li>Limits:<ul><li>Max Query Duration: 24 hours<\/li><li>Max Query Result Size: 100 MB<\/li><li>Max Query Statement Size: 100KB<\/li><li>Result Retention: 24 hours<\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Federated Queries<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Query and analyze data across operational databases, data warehouses, and data lakes.&nbsp;<\/li><li>There is no need to load data to Redshift to query the data.<\/li><li>Only read access.<\/li><li>Can work with external databases in Amazon RDS for PostgreSQL, Amazon Aurora PostgreSQL-Compatible Edition, Amazon RDS for MySQL, and Amazon Aurora MySQL-Compatible Edition.<\/li><li>It distributes part of the computation for federated queries directly into the remote operational databases to reduce data movement and improve performance.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>System Tables<\/strong> <strong>and Views<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>It contains information about how the system is functioning.&nbsp;<\/li><li>You can query these system tables and views the same way that you would query any other database tables.&nbsp;<\/li><li>Types of system tables and views:<ul><li><span style=\"color:#ea1d0a\" class=\"has-inline-color\">SVV<\/span> views contain information about database objects with references to transient STV tables.<\/li><\/ul><ul><li><span style=\"color:#ea1d0a\" class=\"has-inline-color\">SYS<\/span> views are used to monitor query and workload usage for provisioned clusters and serverless workgroups.<\/li><li><span style=\"color:#ea1d0a\" class=\"has-inline-color\">STL<\/span> views are generated from logs that have been persisted to disk to provide a history of the system.<\/li><li><span style=\"color:#ea1d0a\" class=\"has-inline-color\">STV<\/span> tables are virtual system tables that contain snapshots of the current system data. They are based on transient in-memory data and are not persisted to disk-based logs or regular tables.<\/li><li><span style=\"color:#ea1d0a\" class=\"has-inline-color\">SVCS<\/span> views provide details about queries on both the main and concurrency scaling clusters.<\/li><li><span style=\"color:#ea1d0a\" class=\"has-inline-color\">SVL<\/span> views provide details about queries on main clusters.<\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>UDF (User-defined Functions)<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Custom scalar user-defined function<\/li><li>It is stored in the database and is available for any user with sufficient privileges to run.<\/li><li>Three (3) types of UDF:<ol><li><strong>Scalar Python UDFs:<\/strong><ul><li>&nbsp;A Python program that runs when the function is called and returns a single value.<\/li><\/ul><\/li><li><strong>Scalar SQL UDFs<\/strong>:<ul><li>&nbsp;A SQL SELECT clause that runs when the function is called and returns a single value<\/li><\/ul><\/li><li><strong>Scalar Lambda UDFs:<\/strong><ul><li>A custom function defined in AWS Lambda as part of SQL queries.<\/li><li>The input and return data types can be any standard Amazon Redshift data type.<\/li><\/ul><\/li><\/ol><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Security<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Encryption at rest:<ul><li>Database encryption using KMS or HSM.<\/li><\/ul><\/li><li>Encryption in flight:<ul><li>SSL connection with clients via JDBC\/ODBC.<\/li><li>SSL connection between Amazon S3 or DynamoDB:<\/li><li>Data in transit between AWS CLI, SDK, or API clients and Amazon Redshift endpoints is encrypted and signed.<\/li><\/ul><\/li><li>IAM Roles<\/li><li>VPC\/Security groups <strong><span style=\"color:#a30900\" class=\"has-inline-color\">(Redshift is inside a VPC)<\/span><\/strong><\/li><li>Column-level access through GRANT statement.<\/li><li>Row-level Security (RLS) through RLS policy<\/li><li>Role-based Access Control (RBAC)<\/li><li>Authentication in Query Editor:<ul><li>Username\/Password<\/li><li>IAM Identity Center (including SSO)<\/li><li>Federated user (Temporary Credentials)<\/li><li>Secrets Manager<\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><span style=\"color:#2ceb0e\" class=\"has-inline-color\">Hands-On<\/span><\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"copycommandS3\"><span class=\"has-inline-color has-ast-global-color-0-color\">Load Data from S3 using the Copy Command.<\/span><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Download the data that we will load to Redshift from this link: <a rel=\"noreferrer noopener\" href=\"https:\/\/res.data.gov.hk\/api\/get-download-file?name=https%3A%2F%2Fwww.rvd.gov.hk%2Fdatagovhk%2F7.3.csv\" data-type=\"URL\" data-id=\"https:\/\/res.data.gov.hk\/api\/get-download-file?name=https%3A%2F%2Fwww.rvd.gov.hk%2Fdatagovhk%2F7.3.csv\" target=\"_blank\">https:\/\/res.data.gov.hk\/api\/get-download-file?name=https%3A%2F%2Fwww.rvd.gov.hk%2Fdatagovhk%2F7.3.csv<\/a><\/li><li>Upload the file to the S3 bucket:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"322\" class=\"wp-image-972\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-19-14.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-19-14.png 1597w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-19-14-300x161.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-19-14-1024x549.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-19-14-768x412.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-19-14-1536x823.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>Create the &#8216;<code>hk_domestic_housing_sales<\/code>&#8216; table in the Redshift Serverless:<ul><li><code>CREATE TABLE IF NOT EXISTS public.hk_domestic_housing_sales (Month VARCHAR(512), PrimarySalesNumber INT, PrimarySalesConsideration INT, SecondarySalesNumber INT, SecondarySalesConsideration INT);<\/code><\/li><\/ul><\/li><li>Using the Query Editor, use the following COPY command to load the data from S3 to the table:<ul><li><code>COPY hk_domestic_housing_sales FROM 's3:\/\/mylinuxsite-dea-c01\/source_data\/hk_housing\/hk_domestic_housing.csv' IAM_ROLE 'arn:aws:iam::&lt;accountNo&gt;:role\/service-role\/AmazonRedshift-CommandsAccessRole-20241004T110943'FORMAT AS CSV IGNOREHEADER 2;<\/code><\/li><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"371\" class=\"wp-image-971\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-46-12.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-46-12.png 1377w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-46-12-300x186.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-46-12-1024x634.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-46-12-768x475.png 768w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>Verify the result:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"370\" class=\"wp-image-970\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-46-36.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-46-36.png 1374w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-46-36-300x185.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-46-36-1024x632.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-10-46-36-768x474.png 768w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"zeroetl\"><span class=\"has-inline-color has-ast-global-color-0-color\">Zero-ETL Integration (From MySQL RDS).<\/span><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Create an RDS Instance with a custom Parameter Group:<ul><li>Set the following parameters:<ol><li><code>binlog_format=ROW<\/code><\/li><li><code>binlog_row_image=full<\/code><\/li><\/ol><\/li><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"323\" class=\"wp-image-982\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-07-55-2.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-07-55-2.png 1585w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-07-55-2-300x162.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-07-55-2-1024x552.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-07-55-2-768x414.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-07-55-2-1536x828.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>Create an EC2 Instance on the same subnet as your RDS instance. You will need this EC2 instance to connect to the RDS instance and create the necessary database and table and load data in the RDS instance.<\/li><li>Create a database and the table &#8216;<code><em>hk_domestic_housing_sales<\/em><\/code>&#8216;:<ul><li><code>CREATE TABLE IF NOT EXISTS hk_domestic_housing_sales (Month VARCHAR(512), PrimarySalesNumber INT, PrimarySalesConsideration INT, SecondarySalesNumber INT, SecondarySalesConsideration INT, <strong>PRIMARY KEY (Month)<\/strong>);<\/code><\/li><li><strong><em>Make sure that your table has a Primary Key.<\/em><\/strong><ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"102\" class=\"wp-image-1004\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-16-58-35.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-16-58-35.png 1900w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-16-58-35-300x51.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-16-58-35-1024x174.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-16-58-35-768x130.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-16-58-35-1536x260.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><\/ul><\/li><li>Update your Redshift Serverless<strong> enable_case_sensitive_identifier<\/strong> to true.<ul><li>$ <code>aws redshift-serverless update-workgroup --workgroup-name default-workgroup --config-parameters parameterKey=enable_case_sensitive_identifier,parameterValue=true<\/code><\/li><\/ul><\/li><li>Update the Redshift <em>Serverless Resource Policy<\/em>: Whitelist the RDS instance and the account owner.<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"243\" class=\"wp-image-983\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-46-03-2.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-46-03-2.png 1576w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-46-03-2-300x122.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-46-03-2-1024x415.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-46-03-2-768x311.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-14-46-03-2-1536x623.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>Create a Zero-ETL Integration in the RDS database:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"327\" class=\"wp-image-996\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-32-11.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-32-11.png 1582w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-32-11-300x163.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-32-11-1024x558.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-32-11-768x418.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-32-11-1536x837.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>A corresponding Zero-ETL Integration entry will also be created in Redshift:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"133\" class=\"wp-image-997\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-35-35.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-35-35.png 1571w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-35-35-300x66.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-35-35-1024x226.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-35-35-768x170.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-15-35-35-1536x339.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>When the Zero-ETL Integration is <em>Active<\/em>, create a Database in Redshift:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"123\" class=\"wp-image-1002\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-06-22.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-06-22.png 1537w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-06-22-300x61.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-06-22-1024x209.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-06-22-768x157.png 768w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><ul><li>Get the Integration ID:<ul><li><code>SELECT integration_id FROM SVV_INTEGRATION limit 100<\/code><\/li><\/ul><\/li><li>Create a database:<ul><li><code>CREATE DATABASE hk_housing FROM INTEGRATION '0c4ce7f4-78f0-460b-bb50-47ec3e783dc4'<\/code><\/li><\/ul><\/li><\/ul><\/li><li>When the database is created, you will see that the table &#8216;<code><em>hk_domestic_housing_sales<\/em><\/code>&#8216; will be synched from RDS to the Redshift database.<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"124\" class=\"wp-image-1003\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-06-01.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-06-01.png 1526w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-06-01-300x62.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-06-01-1024x212.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-06-01-768x159.png 768w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><li>And an empty table is created<\/li><li><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"649\" class=\"wp-image-1001\" style=\"width: 400px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-07-34.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-07-34.png 527w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-07-34-185x300.png 185w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/li><\/ul><\/li><li>Load the data used in this <a href=\"#copycommandS3\" data-type=\"internal\">hands-on<\/a> to the RDS instance:<ul><li>Copy the file from S3 to the EC2 instance created previously:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"117\" class=\"wp-image-977\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-11-59-42.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-11-59-42.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-11-59-42-300x58.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-11-59-42-768x149.png 768w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>Load the CSV file into the table:<ul><li><code>LOAD DATA LOCAL INFILE '\/home\/ssm-user\/hk_domestic_housing.csv' INTO TABLE hk_domestic_housing_sales FIELDS TERMINATED BY ',' IGNORE 2 LINES;<\/code><\/li><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"274\" class=\"wp-image-978\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-12-04-36.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-12-04-36.png 1890w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-12-04-36-300x137.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-12-04-36-1024x467.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-12-04-36-768x350.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-12-04-36-1536x701.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><\/ul><\/li><li>Validate that the data from the RDS instance is synchronized to Redshift:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"392\" class=\"wp-image-1006\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-24-26.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-24-26.png 1311w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-24-26-300x196.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-24-26-1024x669.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-24-26-768x501.png 768w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><span class=\"has-inline-color has-ast-global-color-0-color\">Redshift Spectrum.<\/span><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>We will use the same test data as we used in <a href=\"#copycommandS3\">this<\/a> exercise. <\/li><li>Create the external schema in Redshift:<ul><li><code>create external schema athena_schema from data catalog database 'mylinuxsite-dea-c01' iam_role 'arn:aws:iam::<code>&lt;accountNo&gt;<\/code>:role\/service-role\/AmazonRedshift-CommandsAccessRole-20241004T110943'<\/code><\/li><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"278\" class=\"wp-image-1036\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-52-01.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-52-01.png 1837w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-52-01-300x139.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-52-01-1024x475.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-52-01-768x356.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-52-01-1536x712.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>Create an external table in Redshift:<ul><li><code>create external table athena_schema.hk_domestic_housing(Month VARCHAR(512), PrimarySalesNumber INT, PrimarySalesConsideration INT, SecondarySalesNumber INT, SecondarySalesConsideration INT )row format delimited fields terminated by ','stored as textfile LOCATION 's3:\/\/mylinuxsite-dea-c01\/source_data\/hk_housing\/'table properties ('skip.header.line.count'='2')<\/code><\/li><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"279\" class=\"wp-image-1037\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-58-23.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-58-23.png 1835w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-58-23-300x139.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-58-23-1024x475.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-58-23-768x357.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-15-58-23-1536x713.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>You can validate if the external table is created by looking at the Athena catalogue:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"286\" class=\"wp-image-1038\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-16-00-11.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-16-00-11.png 1790w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-16-00-11-300x143.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-16-00-11-1024x488.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-16-00-11-768x366.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-16-00-11-1536x732.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>Test to see if you can query the S3 file:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"392\" class=\"wp-image-1006\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-24-26.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-24-26.png 1311w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-24-26-300x196.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-24-26-1024x669.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-17-17-24-26-768x501.png 768w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><span class=\"has-inline-color has-ast-global-color-0-color\">Use Data API to Query the <em>hk_domestic_housing<\/em> Table.<\/span><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>We will use Secrets Manager for authentication:<ul><li>Create a Redshift secret:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"277\" class=\"wp-image-1040\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-19-56-55.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-19-56-55.png 1791w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-19-56-55-300x139.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-19-56-55-1024x473.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-19-56-55-768x355.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-19-56-55-1536x709.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><\/ul><\/li><li>Execute a Data API query:<ul><li><code>aws redshift-data execute-statement --region us-east-1 --secret arn:aws:secretsmanager:us-east-1::secret:mylinuxsite-dea-c01-XN7DpZ --workgroup-name default-workgroup --sql \"select * from athena_schema.hk_domestic_housing limit 5;\" --database hk_housing<\/code><\/li><li><img loading=\"lazy\" decoding=\"async\" width=\"2000\" height=\"145\" class=\"wp-image-1041\" style=\"width: 2000px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-03-31.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-03-31.png 1880w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-03-31-300x22.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-03-31-1024x74.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-03-31-768x56.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-03-31-1536x111.png 1536w\" sizes=\"auto, (max-width: 2000px) 100vw, 2000px\" \/><\/li><\/ul><\/li><li>Get the result:<ul><li><code>aws redshift-data get-statement-result --id 229b7544-b524-41e9-aa41-ae1a3820db39<\/code><\/li><li><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"219\" class=\"wp-image-1042\" style=\"width: 800px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-04-00.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-04-00.png 1387w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-04-00-300x82.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-04-00-1024x281.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-21-20-04-00-768x210.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/li><\/ul><\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><span class=\"has-inline-color has-ast-global-color-0-color\">Federated Query with RDS MySQL<\/span><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>For this hands-on, we will use the same RDS MySQL database we used in the <a href=\"#zeroetl\">Zero-ETL Integration hands-on<\/a>.<\/li><li>Create the external schema in Redshift:<ul><li><code>CREATE EXTERNAL SCHEMA IF NOT EXISTS mysql_federated_query_schema FROM MYSQL DATABASE 'redshiftetl' URI 'database-1.czuii808i5qm.us-east-1.rds.amazonaws.com' IAM_ROLE standard SECRET_ARN 'arn:aws:secretsmanager:us-east-1::secret:mylinuxsite-dea-c01-ub5VRI'<\/code><\/li><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"282\" class=\"wp-image-1058\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-35.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-35.png 1823w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-35-300x141.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-35-1024x482.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-35-768x361.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-35-1536x723.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><li>Query the external schema from Redshift:<ul><li><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"277\" class=\"wp-image-1059\" style=\"width: 600px;\" src=\"http:\/\/192.168.1.3\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-51.png\" alt=\"\" srcset=\"https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-51.png 1833w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-51-300x139.png 300w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-51-1024x473.png 1024w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-51-768x355.png 768w, https:\/\/mylinuxsite.com\/wordpress\/wp-content\/uploads\/2024\/10\/Screenshot-from-2024-10-24-16-06-51-1536x710.png 1536w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/li><\/ul><\/li><\/ul>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[11],"tags":[],"class_list":["post-942","post","type-post","status-publish","format-standard","hentry","category-aws-review-notes"],"_links":{"self":[{"href":"https:\/\/mylinuxsite.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/942","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mylinuxsite.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mylinuxsite.com\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mylinuxsite.com\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mylinuxsite.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=942"}],"version-history":[{"count":82,"href":"https:\/\/mylinuxsite.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/942\/revisions"}],"predecessor-version":[{"id":1388,"href":"https:\/\/mylinuxsite.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/942\/revisions\/1388"}],"wp:attachment":[{"href":"https:\/\/mylinuxsite.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=942"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mylinuxsite.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=942"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mylinuxsite.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=942"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}