Zipkin vs. Jaeger: What Is the Difference?

What Is Zipkin?

Zipkin is a distributed system for visualizing trace data within and between services. Initially developed at Twitter using Google’s open source Dapper project, Zipkin is now available as an open source tool.

You can use Zipkin to perform forensic investigations without recreating application flows from the log data. It consists of a Java-enabled architecture including four components:

A collector—validates and passes incoming data to the relevant storage.
A storage service—you can use MySQL, Cassandra, Elastic Search, and various other options.
A search service—lets users query and retrieve trace data from the database.
A web user interface (UI)—displays query results.

What Is Jaeger?

Jaeger is a distributed software tracing system that monitors and troubleshoots microservices-based systems. Initially developed by Uber Technologies, Jaeger is now available as an open source tool.

You can use Jaeger for distributed context propagation and distributed transaction monitoring. It can help you run root cause analysis and service dependency analysis, as well as optimize performance or latency.

This is part of a series of articles about Zipkin.

In this article

Zipkin Tracing vs. Jaeger Tracing

Integrating Frameworks and Libraries

Zipkin and Jaeger provide different support and framework options. Zipkin supports popular frameworks in official clients and lets the community instrument small libraries such as database drivers. Jaeger employs open tracing libraries for instrumentation, allowing you to use various contributed projects.

Zipkin and Jaeger both support drop-in implementation for major frameworks such as Python’s Django, Express.js in Node.js, and Java’s Spring. Jaeger also uses the opentracing-contrib project to provide instrumentation for database libraries, such as the AWS SDK, gRPC, and Thrift in various languages.

Deployment and Operations

Key aspects employed by both Zipkin and Jaeger include:

Instrumenting traces to a collector, which writes the data to a specified data store.
Using the query service to provide a user interface API (for example, see the Jaeger Query Service).
Supporting multiple storage backends, including Cassandra and Elasticsearch.
Running systems that require monitoring components and managing the data store.
Collecting metrics exported by Prometheus, which you can use for monitoring purposes.
Offloading data maintenance to a hosted Elasticsearch, which is easier to access than Cassandra. If you host the data store on your own, you’re responsible for maintaining it.

Zipkin and Jaeger differ in how they package and deploy components. Here are key differences:

Jaeger is from the Cloud Native Computing Foundation (CNCF), using Kubernetes as the preferred platform to deploy applications. It offers official Helm charts and Kubernetes templates in the incubator deploying its collector, agent, UI, and query API. Additionally, it employs service proxies, such as Isito and Envoy, to support easier call tracing across containers.

Zipkin provides a unified process that encompasses all components, including a collector, data store, query API, and user interface, and lets you use Java programs and Docker images. It offers an easier deployment but its documentation is limited to a readme – it does not offer full deployment documentation like Jaeger does.

Architecture

Zipkin and Jaeger are both systems that collect metrics exported from Prometheus. They let you offload data store maintenance to Elasticsearch or Cassandra and run the datastore yourself. However, this is where their architectural similarities end.

Zipkin architecture

Zipkin was written in Java, supporting Java 6 and later versions. It uses Apache Thrift, a binary communication protocol, and can leverage both Elasticsearch and Cassandra as a scalable back end. Zipkin is a unified process that includes all components, making deployment easier. It delivers data to collectors through HTTP, Kafka, or Scribe.

Jaeger architecture

Jaeger uses Golang to prevent dependencies from being installed on the host and avoid language or interpreter virtual machine (VM) overhead. It uses a similar architecture to Zipkin, using a query service, web UI, collectors, and clients. However, it is not a single process, and it deploys an agent on each host to aggregate data locally.

Jaeger’s agent works by receiving data over a user datagram protocol (UDP) connection, batches the data, sends it to a collector, and stores it in Cassandra or Elasticsearch. Its query service can directly access the data store and pass the information to the web UI.

Jaeger samples 0.1% of the traces passing through each client using probabilistic sampling. It builds on adaptive sampling, adding additional context to improve decisions. It also lets you alter the percentage of traces by re-configuring the agent’s size.

Limitations

Both Zipkin and Jaeger have limitations you should be aware of.

Zipkin limitations

Uses a centralized design that is less modular than Jaeger. This makes it slower and less flexible. That may not be an issue for smaller systems, but as you scale up, it can become significant.
Core components are written in Java, which is ideal for organizations that rely on Java and prefer tools using it.
Uses in-memory storage, so if the system fails or is powered off, all recorded data is lost, unless data is offloaded to persistent storage like ElasticSearch.
No official support for common languages like Ruby, PHP, and Python.

Jaeger limitations

A newer tool that is less mature, and may not be accepted by corporate IT.
Uses Go as its primary language, which can make it more difficult to use for those not familiar with the language.
A more advanced architecture, which has advantages for performance, scalability, and reliability, but is relatively complex and difficult to maintain.
Like Zipkin, it uses in-memory storage which can lead to data loss, unless a persistent data store is set up.
The Jaeger API does not support Zipkin.

Related content: Read our guide to Zipkin Spring Boot (coming soon)

Zipkin or Jaeger: Which Is Right for You?

Both Zipkin and Jaeger are excellent choices for collecting and managing distributed tracking data, and both have similar functionality. Both of them offer:

Support for distributed libraries for tracing—OpenCensus, and OpenTelemetry, OpenTracing.
A wide range of tool integrations and extensions.
Support for containerization and virtualization.
In-memory storage, which can result in data loss

Here are some guidelines for choosing the best tool for your project:

Zipkin is a more mature tool, with a bigger and more active community. It is used in a wide range of industries and is well suited for the enterprise IT world, which primarily uses Java. Choose Zipkin if you want to go with the mainstream, widely adopted solution.

Jaeger is a newer tool with a smaller community, but is backed by the CNCF which instills trust in the project. It provides higher speed, flexibility, and scalability, using a distributed architecture. It also supports tracing in more languages. However, it can be more complex to use and, as a less mature project, presents some risk for enterprise environments. Choose Jaeger if you are looking for a cutting edge solution and are willing to accept a few rough edges.

Distributed Tracing with Lumigo

Lumigo is a cloud native observability tool, purpose-built to navigate the complexities of microservices. Through automated distributed tracing, Lumigo stitches together the many components of a containerized application and tracks every service in a request. When an error or failure occurs, users will see not only the impacted service, but the entire request in one visual map so you can easily understand the root cause, limit impact and prevent future failures.

With deep debugging data in to applications and infrastructure, developers have all the information they need to monitor and troubleshoot their containers with out any of the manual work:

Automatic correlation of logs, metrics and traces into end-to-end visualization of requests and full system map of applications
Monitor and debug third party APIs and managed services (ex. Amazon DynamoDB, Twilio, Stripe)
Go from alert (in Slack, PagerDuty and other workflow tools) to root cause analysis in one click
Understand system behavior and explore performance and cost issues

Get a free Lumigo account!