Serverless CI/CD: How we added a staging step

Mar 17 2020

Unit tests and integration tests are vitally important, but sometimes even those aren’t sufficient to ensure that critical services in your application will function smoothly in production.

In those cases, adding a staging step to our CI/CD process allows us to test a feature with real data in a less supervised environment. For example, here at Lumigo we decided to use it for our Node.js tracer.

In this article, we’ll share how we automate the process of deployment to the staging environment and release using serverless.

Before we get started, you should already have the following in place:

Final Release – a flow for releasing the final version to production
Deploy Staging – a flow for deploying the staging environment

Our original CI/CD architecture

Before adding a staging environment, our CI/CD flow at Lumigo consisted of two parts:

Before merging the feature to master it had to pass unit tests and integration tests.
When the code was merged to master, a CircleCI job was triggered and released the new version to npm (“final release”).

For more on this, read Development workflow for serverless applications.

Our CI/CD architecture with staging

And this is how the CI/CD process looks with the staging step added:

Release beta and trigger

This is the first step of the process and, in our case, a CircleCI job is triggered on merge to master. It runs a bash file consisting of three parts:

Release the beta version.
Deploy the staging environment with the new beta version. A notification is triggered if errors occur in this environment.
Release the final version with delay. The delay (of 2 hours) provides enough time for the staging environment to run with the new beta version.

Let’s go into those steps in more detail:

1. Releasing beta version to NPM

With npm we can add a “beta” tag to the release.

In our package.json, the version is usually the release version and not the beta version. We need to update this version to be the beta version before the release to npm:

Now, in npm, under “Versions”, you should see something like this:

If your project isn’t an npm package, you can release the beta version in a different way.

2. Trigger deploy-staging

The beta version is in npm, now the staging environment should use it.

We should trigger the CircleCI job that deploys the staging environment:

3. Trigger release-with-delay

We want to run the release flow (with delay). We will do that by triggering the Lambda step-function-invoker:

Install beta on staging and monitor it

Use beta version

In the package.json file, we need to change the dependency to use the beta version:

If you aren’t using Node.js, you have a different requirements file, edit it instead.

Searching for errors in the staging environment

The next step is verifying that the staging environment works as expected and there aren’t any errors. In order to automate this process, we – of course – use Lumigo! It monitors your serverless application and it sends you a notification if there was an error. So, if there aren’t any problems, no manual work is needed. By default, errors are exceptions in Lambdas, but you can configure other types of errors as well.

If you want, you can also manually check the status of specific Lambdas in the staging environment by using CloudWatch.

What can we do if there are errors in the staging environment?

AWS Step Functions have the ability to be stopped. So if we see errors in staging and we don’t want to release our version, we can just halt the execution of the Step Function: release-with-delay. We can stop the execution from the AWS console by selecting the running execution and clicking “Stop execution”:

What if we have to release now and can’t wait for the delay to finish?

There are sometimes cases where we need to release as soon as possible, like in the case of a bugfix. In those cases, we can simply stop the execution of the release-with-delay Step Function, then manually trigger the final-release lambda.

Building the final release flow

As we’ve already discussed, there are several things we want to achieve with our final release flow:

A delay of 2 hours.
It should be stoppable after it’s started.
It should release automatically if not stopped manually.
There should be an option to disable the delay if necessary.

Step-function-invoker lambda

First, we need to define the step-function-invoker Lambda. This will make sure that only one instance of this Step Function is running each time, in order to avoid collision of releases.

Final-release Lambda

If no issues occur in staging that prompt us to stop the execution of the step function, the final-release Lambda will release our version automatically after the set delay.

Let’s define the final-release lambda:

We also need to configure a CIRCLECI_TOKEN as an environment variable in CircleCI.

Make sure the version of your CircleCI config file is supported: https://circleci.com/docs/2.0/api-job-trigger/. At the time of writing, version 2.1 isn’t supported.

If you aren’t using CircleCI, replace the code in the handler so that it calls your final release flow.

Release-with-delay Step Function

We are using a Step Function here because it allows us to create a delay while giving us the option to cancel the process after it has started. You can read more about step functions here.

Let’s define the step-function in our serverless.yml file:

Conclusion

There you have it: a serverless-focused CI/CD flow that includes a staging environment, is composed entirely of serverless components, and can easily be expanded to include more services.

Adding a staging environment to the CI/CD flow can often mock production behavior better than tests, so we see it as a vitally important step when it comes to critical services.

Over the past two years the R&D team here at Lumigo has gained a wealth of hard-earned experience in the particular requirements of CI/CD as it pertains to serverless development, and we’ll continue to share what we learn in the serverless trenches as we hone our approach.

Debug fast and move on

Resolve issues 3x faster
Reduce error rate
Speed up development

Start for Free

Webinar: How Jit Reduced Serverless Troubleshooting by 80%

Serverless CI/CD: How we added a staging step

Our original CI/CD architecture

Our CI/CD architecture with staging