Kinesis
- A fully managed service that allows you to ingest, buffer, and process streaming data in real-time.
- Can handle any amount of streaming data and process data from hundreds of thousands of sources with very low latencies
- A Producer/Consumer model
- Kinesis Capabilities:
- Kinesis Data Streams(KDS)
- A serverless streaming data service that makes it easy to capture, process, and store data stream at any scale.
- Producers use SDK, KPL or through an agent
- Producers emit data records that contain partition keys. Partition keys ultimately determine which shard ingests the data record for a data stream
- Shard
- the base throughput unit of an Amazon Kinesis data stream.
- can ingest up to 1000 data records per second, or 1MB/sec.
- grouped into Data Streams which will retain data for 24 hours by default, or optionally up to 365 days.
- scaling is done through shards
- Consumers can be an app(SDK, KCL) or other AWS services such as Lambda, Kinesis FireHorse or Analytics
- KCL:
- Uses Lease for consumer to lock on the shard.
- Consumer cannot hold the Lease of a shard at the same time
- Consumed records will have a sequence no which is added by Kinesis Data Streams
- 2 types of consumption mechanisms:
- Shared – 2 MB/sec/shard shared across all consumers
- Enhanced – 2MB/sec/shard/consumer
- Billed per shard
- Records are immutable.
- Realtime, replay capability
- Kinesis Agent is a stand-alone Java software application that offers an easy way to collect and send data to Kinesis Data Streams.
- Kinesis Data Firehorse (KDF)
- A fully managed service that makes it easy to capture, transform, and load massive volumes of streaming data into a data store or analytics tool.
- Requires a lambda function to transform data
- Write to destination in batches i.e. near real-time
- Supported destination includes Amazon S3, Amazon Redshift, Amazon OpenSearch Service, HTTP endpoints, Datadog, New Relic, MongoDB, and Splunk as destinations.
- Failed data can be copied to S3 bucket
- Kinesis Data Analytics (KDA)
- A fully managed service for analyzing streaming data in real-time.
- You get a console-based editor to build SQL queries.
- Kinesis Data Analytics Studio supports sub-second queries with built-in visualizations.
- Automatic Scaling
- Real-time
- Integrates with KDS and KDF
- Use cases: Real-time dashboard, metrics, time-series.
- Kinesis Video Streams
- makes it easy to securely stream video from connected devices to AWS for analytics, machine learning (ML), playback, and other processing.
- Kinesis Data Streams(KDS)
SQS
- A fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications.
- A Producer/Consumer model.
- Unlimited throughput/queue
- Consumers must delete the message.
- Has 2 types:
- Standard
- FIFO
- Best effort ordering and message can have duplicates unless using FIFO.
- Can receive up to max 10 messages.
- FIFO Queue:
- Ordered
- No duplicates
- Limited throughput (compared to Standard)
- Queue name must end with .fifo
- Has batching (support 3000 transactions/sec per API call)
- W/o batching supports 300 API/sec
- S3 is not allowed to send notifications to this type of SQS
- Has access policy (like S3)
- Message Group ID
- A tag that specifies that a message belongs to a specific message group.
- The same message group are always processed one by one, in a strict order relative to the message group.
- Message Deduplication ID
- A token that is used for deduplication of sent messages.
- Messages sent with the same message deduplication ID are accepted successfully but aren’t delivered during the 5-minute deduplication interval.
- Request-Response Model:
- Used when a producer requires responses from consumers
- A producer will send a message containing a ‘Correlation Id’ and ‘Response Queue Name’ is sent to the Request queue.
- A consumer will respond by sending a message containing the same ‘Correlation Id’ to the queue specified in the request.
- Implemented through SQS Temporary Queue Client.
- Important Configurations:
- Visibility Timeout – a time when it will not be visible to other consumers. 0 – 12 hours. Can be set programmatically(ChangeMessageVisibility API)
- Delivery Delay – time to delay the first delivery of each message added to the queue. 0 – 15 minutes
- Receive Message Timeout – time that polling will wait for messages to become available to receive. 0 – 20 seconds
- Message Retention Period- time that Amazon SQS retains a message that does not get deleted. 1 minute to 14 days
- Maximum message size – maximum message size for your queue. 1 – 256 KB.
- Dead Letter Queue (DLQ)
- message can’t be consumed successfully can be sent to DLQ
- DLQ is another SQS but with Redrive Allow Policy enabled
- Maximum Receives determines when a message will be sent to the DLQ. If the ReceiveCount exceeds this value then the message will go to the DLQ
- Delay queues:
- let you postpone the delivery of new messages to consumers for a number of seconds,
- similar to visibility timeouts because both features make messages unavailable to consumers for a specific period of time
- difference between the two is that, for delay queues, a message is hidden when it is first added to queue, whereas for visibility timeouts a message is hidden only after it is consumed from the queue.
SNS
- A fully managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication
- A Pub/Sub model
- Subscribers can be SQS, Lambda, Email, SMS, HTTP/HTTPS, Mobile endpoints.
- Has Access Policy
- Has Subscription Filter
- Can define Message Attributes
- Has DLQ
- Has Message Group ID and Deduplication ID
- Has TTL (only for Mobile Endpoints)
- Can be coupled with SQS for the Fan Out pattern. But SQS access policy must allow SNS to write to SQS
- Has 2 types (similar to SQS):
- Standard
- FIFO – only allows SQS for subscription
Active MQ
- A managed message broker service for Apache ActiveMQ and RabbitMQ.
- Supports open/standard protocols such as MQTT, AMQP, STOMP
- HA (Active/Standby) but requires EFS.