“Serverless computing is a cloud-computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application.”
— “Serverless Computing”, Wikipedia
This mundane description of serverless is perhaps an understatement of one of the major shifts in recent years. As a developer, serverless allows you for the first time in history to concentrate exclusively on what you do best: building your product, without worrying about the infrastructure. So what happens when we take the serverless mindset and implement a CI/CD pipeline with the same approach? A supercharged CI/CD flow.
In this post, I’ll describe Lumigo’s journey to using 100% serverless technologies in its CI/CD pipeline, including:
Serverless is different than traditional architectures and that affects the way CI/CD is managed. Serverless applications are distributed, often with hundreds of components. As a result:
When we create a serverless development workflow at Lumigo, three principles guide us:
Serverless requires testing on a real AWS environment, mocks of AWS services don’t cut it. Originally we used a shared environment in which each developer used a different prefix to identify their resources. For example dev_john_<lambda_name> but very soon we realized that the developers were stepping on each other’s toes causing problems such as deleting or configuring the wrong resources and being unable to block an account in case it reached its budget threshold.
We quickly moved to an approach in which each developer has a separate AWS environment. with their name, using AWS Organizations to manage it and consolidate billing. In addition, we invest heavily in tools that enable the developers to quickly and efficiently deploy their code to the AWS environment.
A big part of serverless is the notion that you should outsource everything that isn’t part of the core of your business. We took this mindset and implemented it across our entire technology stack. CI/CD, code quality, security checks — we don’t develop anything that we can outsource in-house.
We will always choose the serverless solution over the non-serverless one. For example, we use Serverless Aurora, not RDS; DynamoDB, not DocumentDB, and so on.
Developers are the sole owners of their developed product, from product management, design, coding, testing, deploying, and monitoring. It does not mean they are the ones who do everything. We have a dedicated product team, but the developers participate actively and have a strong say in the way the product will behave. In addition, we don’t have any specialized QA engineers, instead, all the testing is done by the developers themselves and we invest heavily in testing automation. The same thing goes for our monitoring efforts: the developers monitor the features and bugs they release to production.
A lot of responsibility falls on the developer’s shoulders.
AWS environments in Lumigo do not end only with personal AWS environments. We have a couple of shared integration environments that are part of the automated CI/CD process, and we have two production environments that are composed of an environment our customers use and a monitoring environment that runs our own product that monitors the customer’s environment.
We are eating our own dog food, which helps us both find potential issues ahead of time and sharpen our product. New features have often been added to our product after internal feedback about missing capabilities. We are a serverless power user and we have seen many times that capabilities that were requested internally by our own team were eventually requested by our customers too.
The famous infinity loop guides us as well in our internal serverless development flow. In this section, I’ll go over each phase and discuss how it affects our development flow.
When Lumigo just started and we were small we used Kanban to drive our workflow. We had a long list of prioritized tasks and each developer picked the top one. As we grew, management wanted more visibility so we moved to a more traditional Scrum. Now, we use a mix of Kanban and Scrum.
Each of our sprints is a week-long. We keep them short on purpose to keep things moving fast, but we don’t wait for the end of the sprint to deliver. We are very CD-oriented. When a piece of code passes through all of our gates, it’s pushed to production.
We use Github to store our code and the Github flow in which you have only master and feature branches. Each merge from a feature branch to the master means deployment to production.
In the early days, we had a very heated discussion regarding mono vs. multi repo. We chose multi because it was more suited for microservice deployments for the following reasons:
Read more about the mono vs. multi repo issue here.
Whether we’re dealing with a dynamic or static language, linting is a mandatory step. Lumigo’s linting flow includes:
We try to run the linting process as early as possible in the build process, and as early as possible means local. Everything is backed into a git pre-commit hook. We use the pre-commit framework, which gives us the ability to run the linting process either manually or in CI environments.
The best way to test our code is in an actual AWS environment. In the beginning, orchestrating our code deployment was very manual, but as the number of services grew, as well as their inter-dependencies, a special tool was built internally to orchestrate the deployment. We call this tool uber-deploy.
Uber-deploy enables our developers to easily install these services in their environment, so no one needs to know the various dependencies.
The uber-deploy tool Works like this:
As part of feature design, each developer decides which type of tests should be added.
We use three types of tests:
Because testing in the cloud is slower than testing locally, we prefer to detect as many issues as possible before pushing to remote. As mentioned above, we use pre-commit for unit testing and linting.
Non-serverless resources usually run on a local container so you can easily test your end-to-end flow locally. We tried to do the same for serverless resources, but it didn’t work well.
So as a rule, we don’t use service mocks, we always run things in an AWS environment. However, for local unit testing, we do use moto.
One of our system’s components is an SDK, which our customers need to embed. The SDK is very sensitive and a bug there means the Lambda will crash. So we use one of our internal systems as a staging environment for the SDK. As I said, we believe in eating our own dog food.
We created an automated flow that releases an alpha version to NPM, which is then deployed to our staging environment. After a successful deployment, a step function is triggered which waits 2 hours before making a full release of the SDK. The step function gives us the possibility to stop the deployment at any moment in case we identify an issue.
You can read more about this flow here
The code was written and we are ready to release, but not so fast. Merging to master means automatic deployment to production. There are several gates the pull request has to pass through before the developer can merge to master. Except for code review, which is a manual process, everything else is automated. Automation includes:
Monitoring is hard in serverless for several reasons:
To address monitoring, and perform root cause analysis, we obviously use the Lumigo platform, a monitoring and debugging tool built from the ground up for serverless. You too can use Lumigo for free — get your free account.
On Wednesday, Feb 24, I’ll be hosting a live webinar in which I go in-depth on using CI/CD with serverless. Please join me! Register here.
Get a free cool serverless t-shirt!
All webinar attendees who connect a Lumigo tracer to their Lambdas will receive one of these cool serverless-themed t-shirts.