SQS - Nejati Notes

Amazon Simple Queue Service (Amazon SQS) offers a *durable* queue that lets you integrate and *decouple* distributed software systems and components. - SQS is protected by default AWS server-side encryption (SSE). A custom [SSE](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-server-side-encryption.html) also can be set that is managed by AWS KMS (Key Management Service). - Standard queues support [at-least-once message delivery](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/standard-queues-at-least-once-delivery.html), and FIFO queues support [exactly-once message processing](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues-exactly-once-processing.html) and [high-throughput](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/high-throughput-fifo.html) mode. - To enhance request capacity in *high throughput FIFO queues*, increasing the number of *message groups* is recommended. - Amazon SQS can process each [buffered request](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-client-side-buffering-request-batching.html) independently. Messages can be grouped and buffer in the client before sending to reduce API calls. - Amazon SQS *locks* your messages during processing. The "lock" (visibility timeout) in SQS is a mechanism that temporarily hides a message from other consumers after one consumer has retrieved it, giving that consumer time to process it. ### Comparison with SNS and MQ - typically, each message in an SQS queue is intended to be processed by only one consumer (or "subscriber") application or service. > [!info] > **Contrast with Fanout (SQS + SNS):** The text correctly points out that for wider distribution to _multiple different kinds of subscribers_ that all need a copy of the _same message_, you integrate SQS with Amazon SNS (Simple Notification Service). This is the "fanout" pattern: > - A producer sends a message to an **SNS topic**. > - Multiple **SQS queues** (each potentially serving a different microservice or application component) can subscribe to that SNS topic. > - SNS then delivers a copy of the message to **each subscribed SQS queue**. > - Each SQS queue then has its own set of consumers (still following the "single subscriber processes a given message from _its_ queue" principle) to handle the message according to its specific needs. - **Amazon SNS** allows publishers to send messages to multiple subscribers through topics, which serve as communication channels. Subscribers receive published messages using a supported endpoint type, such as [Amazon Data Firehose](https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html), [Amazon SQS](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html), [Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html), HTTP, email, mobile push notifications, and mobile text messages (SMS). - **Kafka** is a append-only log. Messages in Kafka are persisted for a configurable retention period, regardless of whether they've been consumed. Kafka supports **multiple, independent consumer groups** subscribing to the same topic. Each consumer group tracks its own progress (offset) through the partitions of a topic. ### Standard vs FIFO #### Standard Queues - **Delivery**: Offers **at-least-once delivery**, meaning a message is delivered at least once, but occasionally, more than one copy of a message might be delivered. - **Ordering**: Provides **best-effort ordering**, which means messages are generally delivered in the order they are sent, but it's not guaranteed. - **Throughput**: Designed for **maximum throughput**, offering nearly unlimited transactions per second (TPS). - **Use Cases**: Best for applications that can process messages that arrive out of order and can handle messages being delivered more than once (idempotent processing). Examples include decoupling microservices, distributing asynchronous tasks (like image processing or sending emails), and buffering batch jobs. #### FIFO (First-In, First-Out) Queues - **Delivery**: Guarantees **exactly-once processing**. SQS automatically handles message deduplication to ensure a message is delivered once and remains available until a consumer processes and deletes it. - **Ordering**: Enforces **strict first-in, first-out order** of messages _within a message group_. Messages with the same message group ID are delivered and processed in the order they were sent. - **Throughput**: Supports high throughput, but has limits compared to standard queues (e.g., up to 3,000 messages per second per API action with batching, or 300 messages per second without batching by default, though this can be increased). - **Message Groups**: A key feature that allows multiple ordered message streams within a single FIFO queue. Messages within the same group are processed in order. - **Use Cases**: Ideal for applications where the order of operations and events is critical, and duplicates cannot be tolerated. Examples include financial transactions, inventory management, or any workflow requiring strict sequencing. > [!warning] > Do not add personally identifiable information (PII) or other confidential or sensitive information in queue names. ### SQS Dead-Letter Queues (DLQs) **Purpose:** To isolate and handle messages that source queues cannot process successfully, preventing system blockage and aiding in debugging. 🧐 #### Core Functionality - **What it is:** A separate SQS queue designated to receive messages that fail processing in a source queue after a certain number of attempts. - **Redrive Policy:** Configured on the _source queue_. It specifies: - The **ARN of the DLQ**. - The **`maxReceiveCount`**: The number of times a message can be received from the source queue before being moved to the DLQ. - **Message Handling:** Problematic messages ("poison pills") are moved from the source queue to the DLQ, allowing the source queue to continue processing valid messages. #### Considerations - **Queue Type Consistency:** DLQ must be the same type as its source queue (Standard to Standard, FIFO to FIFO). - **FIFO Behavior:** Message movement to a FIFO DLQ respects message group ID ordering. - **Manual Reprocessing:** Messages are _not_ automatically moved back from a DLQ to the source queue; this requires a manual or custom process. - **Monitoring:** Essential to set up CloudWatch alarms for DLQ activity (e.g., number of messages) to detect and address issues promptly. 📊 ### Polling: Short vs. Long SQS consumers retrieve messages by "polling" the queue. There are two types of polling: #### Short Polling - **Behavior**: When a consumer requests messages, SQS samples only a **subset of its servers** (not all) and returns messages only from those sampled servers. - **Response Time**: Returns a response **immediately**, even if no messages are found in the sampled servers (resulting in an empty response). - **Default**: This is the default behavior if you don't configure a `ReceiveMessageWaitTimeSeconds` or if you set it to `0`. - **Use Cases**: - When you need the quickest possible response to a receive request. - If your application can tolerate frequent empty responses. - **Implications**: - Can lead to **more empty responses** and thus higher API call volume (potentially higher cost). - Might not retrieve a message even if it exists on a non-sampled server. #### Long Polling ⏱️ - **Behavior**: When a consumer requests messages, SQS queries **all of its servers**. If no messages are available, the request "waits" (hangs) for a specified duration for messages to arrive before sending a response. - **Configuration**: Enabled by setting the `ReceiveMessageWaitTimeSeconds` parameter in the `ReceiveMessage` API call (or in the queue configuration) to a value **greater than 0 seconds** (up to 20 seconds). - **Response Time**: - Returns a response as soon as one or more messages are found on any server. - Returns a response (often empty) after the specified `ReceiveMessageWaitTimeSeconds` if no messages arrive during that period. - **Use Cases**: - Preferred for most applications to reduce empty responses and improve efficiency. - When you want to receive messages as soon as they arrive without constantly re-polling. - **Implications**: - **Reduces the number of empty responses**, which can lower SQS costs. - Eliminates "false empty responses" (where a message is present but not found by short polling). - Can reduce the overall number of receive requests. - - **Long polling is generally recommended** as it's more efficient and cost-effective for most use cases by minimizing empty `ReceiveMessage` responses. It allows SQS to wait for messages to appear in the queue before responding to the `ReceiveMessage` call. ### SQS Delay Queues ⏳ To postpone the delivery of **all new messages** sent to a specific SQS queue for a configurable duration. This means messages become visible to consumers only after the queue's delay period has passed. - **What it is:** A standard or FIFO queue configured with a **default delay period** (from 0 seconds up to 15 minutes). - **How it works:** - When you send a message to a delay queue, it remains invisible to consumers for the duration of the queue's `DelaySeconds` setting. - After the delay period expires for a given message, it becomes visible and can be processed by consumers. - **Scope:** The delay is applied at the **queue level**, affecting all messages sent to that queue unless overridden by a message timer. - **Contrast with [[SQS#Message Timers|Message Timers]]:** - **Delay Queues:** Set a default delay for _all_ messages arriving in the queue. - **Message Timers:** Allow producers to set a specific delay (0 seconds to 15 minutes) for _individual_ messages using the `DelaySeconds` parameter in the `SendMessage` API call. Message timers override the queue's default delay setting. ![[sqs-delay-queues-diagram 1.png]] #### Considerations - **Maximum Delay:** The maximum delay for both queue-level settings and individual message timers is **15 minutes (900 seconds)**. - **Existing Messages:** Changing the delay setting on a queue does not affect messages already in a *standard* queue; it only applies to new messages sent after the change. But it will change the already existing in *FIFO* queues. - **FIFO Queues:** Delay queues can be FIFO queues. The delay affects when a message becomes available for processing, but the FIFO order within a message group is still maintained once messages become visible. ### Temporary Queues 🛠️ **Purpose:** Efficiently manage many short-lived reply queues for request-response patterns, saving costs & reducing API calls. Primarily a **client-side library feature**. - **Virtual Queues:** Client library maps multiple logical "temporary queues" onto a **single, actual SQS *host queue.*** - **Workflow:** 1. **Requester:** Client creates a "virtual queue" (in-memory). Request message includes virtual queue ID, sent to host queue. 2. **Responder:** Sends reply to the _same host queue_, tagging it with the virtual queue ID. 3. **Requester Client:** Polls host queue, dispatches messages to correct in-memory virtual queue based on ID. - **Client-Managed:** Library handles creation, routing, and cleanup. #### Considerations - **Cost Reduction:** Fewer SQS API calls (no individual queue creation/deletion per reply). 💰 - **Low Latency:** "Virtual queue" creation is instant (client-side). - **Simplified Management:** Library abstracts reply queue complexity. - **Not a native SQS type:** It's a client library pattern (e.g., AWS SDK for Java). - **Relies on Host Queues:** Uses standard SQS queues as a backend. - **Uses Message Attributes:** For routing to virtual queues. ### Message Timers Message timers allow you to set an initial invisibility period for a message when it's added to a queue. For example, if you send a message with a 45-second timer, it remains hidden from consumers for the first 45 seconds. The default (minimum) delay for a message is 0 seconds. The maximum is 15 minutes. For information about sending messages with timers using the console, see [Sending a message using a standard queue](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/creating-sqs-standard-queues.html#sqs-send-messages). > [!note] FIFO queues don't support timers on individual messages. To set a delay period on an entire queue, rather than on individual messages, use [[SQS#SQS Delay Queues ⏳|delay queues]]. A message timer setting for an individual message overrides any `DelaySeconds` value on an Amazon SQS delay queue.