Metaphor is an early-stage startup spun out of LinkedIn three years ago. The founders built tools inside of LinkedIn and quickly discovered that one of those tools was rapidly growing in popularity. Metaphor provides a data catalog as a SaaS service to companies of various sizes. In other words, Metaphor connects to the data ecosystem of their customers, pulls in the necessary metadata from the index, and makes it searchable along with the lineage of the data itself.
As Metaphor was asking their developers to own their code in production, they felt they were generally lacking the visibility and context needed to run at optimal efficiency. Metaphor searched for a solution to complement the AWS CloudWatch Logging they had in place. While CloudWatch included much of the data their developers needed to troubleshoot issues across different services and accounts, they didn’t always know what was going wrong within their system. A lot of errors were missed or overlooked because no alerts were raised. This lack of visibility meant that frequently, the customer was the first to alert Metaphor that something was broken. After being alerted by the customer, the engineers would have to dig through the logs to confirm that something was, in fact, broken and to figure out what was causing the issue.
According to Mars, co-founder and CTO of Metaphor, “There’s a lot of things that make it difficult for us specifically as a SaaS provider. In our most protective and isolated settings, we provide individual AWS accounts to our customers in completely isolated environments. Because of this, we end up with a lot of AWS accounts, which makes monitoring across all of them a nightmare. We needed to make it easy to aggregate all of that information to a single place where we can view and analyze the logs quickly when needed.”
Lumigo Empowers Metaphor’s Engineers to Run Their Own Code in Production
With a small team of engineers, Metaphor needed to build an environment that was easy to manage and could scale quickly. Their goal from the beginning was to make their engineers as productive as possible by leveraging technology. To that end, Metaphor decided to deploy their SaaS product on AWS using both AWS lambda as well as Amazon ECS. According to Mars, one thing AWS does extremely well is infrastructure. As he put it, “They guarantee that your servers are up and running and that nothing bad happens. AWS does an excellent job solving this for us.”
In addition to AWS, Metaphor looked for a solution to empower their developers to run their own code in production. After trying several different tools, Metaphor made the decision to move forward with Lumigo as their preferred observability tool. According to Mars, one of the biggest reasons why Metaphor chose Lumigo was its comprehensive monitoring for AWS Lambda and Amazon ECS, which gave Metaphor an extra edge over its competition.
Metaphor relies heavily on Lumigo for alerting their engineers when an issue arises. According to Mars, “Lumigo has excellent alerting support for lambda and ECS. We find Lumigo to be super helpful in prioritizing issues with programmatic alerts. If something completely crashes and burns, we want to know that right away, but there are other things that might be minor issues that don’t require immediate attention. If you just put it in a log, even as an error log or warning log, nobody’s gonna look at it. But if the alert channel keeps ringing with errors, it will get people’s attention.”
Metaphor Eliminates Customer-Reported Incidents
With Lumigo, Metaphor has been able to aggregate all of the troubleshooting data they need across all of their AWS accounts into a single Observability solution so they can quickly resolve issues. As a result of implementing Lumigo, Metaphor has been able to save both time and money by reducing their dependency on logging while almost entirely eliminating customer-reported incidents.
Reduced Dependency on Logging:
According to Mars, “With Lumigo, the logs are correlated directly with the traces in real-time. Because of that, we have all of the data we need to troubleshoot issues in Lumigo and hardly ever need to go back to the logs. Lumigo, to a great extent, has removed our logging dependency for our production environments.”
Metaphor has seen a vast improvement in overall customer experience and satisfaction.
By using Lumigo’s programmatic alerts, Metaphor has ensured they never miss an issue and that they are able to quickly see which issues are potential customer-impacting and which can be dealt with later. Because of this, Metaphor has been able almost entirely to eliminate customer reported incidents, often tackling issues and deploying a fix before customer impact.
Single View Across All AWS Accounts:
With Lumigo, Metaphor has been able to reduce their overall mean time to resolve issues. Metaphor is now able to visualize their entire platform across multiple AWS accounts enabling them to quickly identify where an issue is, what the upstream and downstream impacts are, and ensure that a fix is implemented without delay.
Lumigo is an AWS Partner Network Advanced Technology Partner. Lumigo is a troubleshooting and observability platform that autonomously deploys OpenTelemetry in under 5 minutes with a single click, automatically capturing and contextualizing all of the data developers need to troubleshoot microservice issues in production. Lumigo is the only distributed tracing platform that enriches traces with complete in-context request and response payloads and correlates them to the relevant logs and metrics, enabling developers to resolve issues up to 80% faster.