All Posts

Choosing the right event-routing service for serverless: EventBridge, SNS, or SQS

serverless event routing hero image

This post was written by Dhaval Nagar, founder of AppGambit, a Lumigo partner company. Dhaval is an AWS Serverless Hero and holds 12 AWS certifications. AppGambit is a serverless-focused AWS Partner Consulting company based in India. 

Serverless is synonymous with Event-Driven Architecture, where Events are a fundamental block of information that is passed around to execute certain application logic. It is very important that events are delivered to the right destination with expected behavior to make sure the whole serverless application works as one. Events are relayed from one place to another through communication services, either in sequence or in parallel.

AWS has a few communication services for event messaging with some overlapping features. The two services that first come to mind for most people are Amazon SQS and Amazon SNS.

Amazon SQS is the oldest and first service released in the Amazon Web Services cloud along with EC2 back in 2006. SQS is a queuing service mainly used to reliably store and process messages in sequence by a single service. Amazon SNS came later with a specific communication model, publishers, and subscribers. Recently, AWS added another major service, Amazon EventBridge. It is an extension to the existing service CloudWatch Events.

Let’s look at each of these services and explore their use cases.

For brevity, I am leaving out features such as service resiliency, message durability, encryption, privacy, destination-retries, and dead-letter-queues. All of the services are fully-managed, scalable, serverless, and use the pay-per-use billing model.

Amazon SQS — Message Queues

Amazon SQS is a message queuing service. As the name suggests, it actually creates a queue where the source and the destination system/service can push or listen for the messages. SQS is designed to PULL messages from a queue, either using Short Polling or Long Polling. The Lambda integration is internally using the Long Polling technique to minimize the cost and waste of resources.

Before the SQS-to-Lambda integration, the standard practice was to integrate SNS with SQS and integrate Lambda with SNS to receive messages.

Picture1 - SQS Lambda Trigger

For more detail, please go through the documentation here.

Advantages

  • Retain messages for up to 14 days
  • Possible to batch up to 10 messages for processing
  • You can configure the Priority and FIFO queues to control the message delivery ordering (this impacts the throughput)
  • Delayed message processing based on short or long polling.

Limitations

  • Messages are removed based on the target confirmation
  • Dead-letter queue becomes a must to avoid clogging the main queue in case of errors

Keep in mind

The AWS Lambda service internally uses the continuous long-polling technique to fetch messages from the SQS queue. This counts against the API requests, so your account will be charged for those API calls. So even if your SQS queue is empty for days, you may still see API usage in your detailed bill report.

Amazon SNS — Publish and Subscribe

Amazon SNS is one of the most heavily used and popular services on AWS. It is designed to distribute events among many subscribers. AWS uses SNS to dispatch millions of emails, SMS messages, and push notifications, where each address/endpoint is registered as a subscriber.

Why it’s popular

Amazon SNS is an easy-to-configure mass-communication service. It was the first major service from AWS that allowed developers to send out email, SMS, and push notifications directly, without the need to set up a separate system.

Amazon SNS is a popular service for implementing the Fan-Out (or “event forking”) pattern, which delivers the same message to multiple heterogeneous targets. As modern serverless applications are heavily reliant on “events”, it’s becoming important to make sure every event is not only processed but also audited for future purposes.

Picture2 - SNS Topic with multiple subscribers

Advantages

  • High throughput and low latency
  • A large number of subscribers (a standard topic supports up to 12.5 million subscribers)
  • Can directly send Email, SMS, and Push messages
  • Message filtering at attributes level
  • Message transformation

Limitations

  • Messages are not retained
  • Limited target/destination services

Keep in mind

Traffic monitoring is poor and may need to rely on other mechanisms.

Learn more about using Amazon SNS with serverless.

Amazon EventBridge — Enterprise Event Bus

Amazon EventBridge is a relatively new service. It is an extension of the existing CloudWatch Events service, which allows developers to create event rules and schedule event triggers to target services. EventBridge is built on the concepts of an Enterprise Event Bus, mainly used to coordinate messaging between a small number of services or systems without creating an individual communication endpoint for each.

EventBridge uses the concept of Event Buses to route events from a source to different destinations. Each AWS account has one default event bus, authorized AWS-partner buses, and custom buses.

The event-driven architecture is not limited to serverless. Many systems, mainly SaaS services, generate informational and transactional events that can be consumed. In a pre-EventBridge scenario, if you need those events, you would either pull the events at a scheduled interval or configure webhooks to receive events directly from the SaaS service. This practice requires additional effort in terms of infrastructure resources, development, and maintenance. EventBridge allows authorized SaaS partners to directly push events to Event Bus that can be consumed in your AWS account without managing anything separately.

Picture3 - EventBridge event flow diagram.png

EventBridge also combines some of the same features as SNS, such as multiple subscribers, filtering events, and event transformation before dispatching the event to the target.

Advantages:

  • AWS events
  • Authorized Partner Events
  • Cross-account or cross-application integration
  • Message filtering
  • Message transformation
  • Event archiving and replay
  • A wide number of target AWS services (although you can still send an event to outside services through API Gateway routing or a Lambda function call)
  • Schema registry to check event structure

Limitations

  • Low throughput
  • High latency compared to SNS and SQS

Keep in mind

EventBridge still has a relatively low number of partners, although that is expected to grow with time.

Service-Level Latency

Each of these services is optimal for different use cases and has different service-level throughput and latency. Here is the quick snap of a test that ran over 10,000 messages and you can see the clear distinction in the latency among the three. EventBridge has the highest latency in the range of 300ms to 600ms.

This gives you a better idea of which service is right for you in terms of latency and throughput.

Picture4 - comparing latency of eventbridge, sqs, and sns

Feature-Level Comparison

All three services have some overlapping and some differentiating features. To highlight the differences, I have put together this comparison chart.

There is no clear-cut answer for each use case. But this will help you identify which one is more suitable for you.

Picture5 - comparing features of eventbridge, sqs, sns

Monitoring and Debugging

Each one of these three services allows you to decouple different services within your application, which is an architectural advantage. However, this makes the architecture distributed and therefore hard to monitor and debug in a production environment. The application will fail at a certain point, and when it does, It will be difficult to put together individual log blocks together and find out what went wrong, where, and when.

Imagine your application looks like this, having multiple messaging services between Lambda functions and you want to isolate and debug an issue.

Picture6 - distributed event-driven serverless application diagram

If you just rely on the AWS CloudWatch and X-Ray services, the output would look something like this and would make it difficult to identify what went wrong.

Picture7 - cloudwatch and xray cannot properly track a serverless app

By using an external observability service such as Lumigo, which specializes in the observability and tracing of serverless applications, we can get a more refined view of the service flow along with the transactional logs.

Picture8 - distributed tracing and unified logging of serverless with Lumigo

So if any particular transaction fails, Lumigo can send alerts as well as record the full transactional logs of the event, including the event payload and collected logs from the CloudWatch Logs.

Picture9 - Lumigo vs CloudWatch payload and logs

Conclusion

Serverless applications are distributed and event-driven by nature. AWS offers three major services for routing events: SNS, SQS, and the more recently launched, EventBridge. In this post, we reviewed how each service works a bit differently and is optimal for different use cases. Finally, we also covered how to address monitoring and debugging event-driven, distributed applications using 3rd-party tools such as Lumigo.

Want to learn more? Watch our recorded webinar

Lumigo and AppGambit presented this topic ina. joint webinar on Thursday, December 17, 2020. You can now watch the recording here:

This may also interest you