OpenTelemetry Tracing: The Basics and a Quick Tutorial

  • Topics

What Is OpenTelemetry Tracing? 

OpenTelemetry is an open source observability platform managed by the Cloud Native Computing Foundation (CNCF). One of the main functions of OpenTelemetry is tracing—a method for tracking the execution of requests in a distributed system. 

OpenTelemetry tracing records information about operations performed by an application, such as function calls or API requests. These records, called traces, are a collection of spans, which are time-stamped annotations of the work being done. Tracing provides visibility into the performance, behavior, and interdependencies of applications, allowing you to diagnose issues, monitor performance, and understand complex systems.

OpenTelemetry Tracing addresses the challenges of monitoring distributed systems by allowing spans from different services to be linked into a coherent trace, offering a full picture of the request lifecycle.

Another important benefit of OpenTelemetry Tracing is that it is vendor-neutral and can integrate with various backend systems for storing and analyzing traces. This allows organizations to perform tracing across a wide range of platforms and technologies used in their application environment.

Understanding the OpenTelemetry Tracing API 

TracerProvider

TracerProvider is the main entry point for creating Tracers in OpenTelemetry. It manages a set of Tracer instances that correspond to different instrumentation libraries.

Each TracerProvider is associated with a specific schema URL and can be configured with various options, such as resource, id generator, and sampler. By creating and managing Tracer instances, the TracerProvider ensures that all traces are captured and processed correctly.

The TracerProvider is responsible for storing and retrieving active Span instances, which are the basic unit of work in OpenTelemetry tracing. By managing active Spans, the TracerProvider allows you to track the execution flow of your application accurately, ensuring that all traces are captured and recorded accurately.

Tracer

The Tracer is responsible for starting new Spans, which represent individual units of work within a trace. It also manages the context in which these Spans are created and executed, ensuring that the trace accurately reflects the execution flow of the application.

The Tracer provides methods to interact with the current active Span. This allows you to modify the state of the Span, attach additional information, or even create child Spans that represent sub-tasks within the current unit of work.

Span

The Span is the basic unit of work in OpenTelemetry tracing. It represents an individual operation within a trace, such as a function call, a database query, or a network request. Each Span includes information about its start and end time, parent and children, and associated events or attributes.

A Span has a name, which should be a concise, human-readable string that describes the operation. It also has a set of key-value pairs, known as attributes, that provide additional information about the operation. Events, representing something that occurred during the operation, can also be associated with a Span.

Spans are hierarchical, meaning they can have a parent Span and multiple child Spans. This allows you to capture the relationship between different operations.

Related content: Read our guide to OpenTelemetry architecture

Quick Tutorial: Implementing OpenTelemetry Tracing in Java

Prerequisites

Install the Java OpenTelemetry package using Maven or Gradle. For Maven, you can use this:

<project>
  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-bom</artifactId>
        <version>1.33.0</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>io.opentelemetry</groupId>
      <artifactId>opentelemetry-api</artifactId>
    </dependency>
  </dependencies>
</project>

Get a Tracer

To get a Tracer, you need to initialize an instance of TracerProvider. This provider is responsible for generating Tracers. Let’s take a look at how this is done:

import io.opentelemetry.api.trace.*;
TracerProvider tracerProvider = TracerProvider.getDefault();
Tracer tracer = tracerProvider.get("instrumentation-library-name","semver:1.0.0");

In the above code snippet, we import the necessary OpenTelemetry API and then get the default TracerProvider. We then use this provider to get a Tracer, specifying the name of the instrumentation library and its semantic version. This Tracer can now be used to create Spans.

Create a Span

A Span represents a single operation within a trace. It encapsulates the operation’s time, status, associated attributes, and other related information.

Here’s how to create a Span:

Span span = tracer.spanBuilder("operationName").startSpan();
try (Scope scope = span.makeCurrent()) {
  // your application code here
} finally {
  span.end();
}

In this code, we use the spanBuilder method of our Tracer to create a new Span. We then use a try-with-resources statement to make this Span the current Span for the contained code block. Once the code block is executed, the Span is automatically ended.

Create Nested Spans

Let’s take our understanding a step further and create nested Spans. Nested Spans are useful when you want to trace a series of operations or a sequence of events within a single operation.

Here’s an example:

Span parentSpan = tracer.spanBuilder("parentOperation").startSpan();
try (Scope parentScope = parentSpan.makeCurrent()) {
  // your application code here
  
  Span childSpan = tracer.spanBuilder("childOperation").startSpan();
  try (Scope childScope = childSpan.makeCurrent()) {
    // your application code here
  } finally {
    childSpan.end();
  }
  
} finally {
  parentSpan.end();
}

In this example, we first create a parent Span. Within this Span, we create a child Span. Note that each Span is made current for the block of code it contains, and each Span is ended when its associated try-with-resources statement is closed.

Add Span Attributes

OpenTelemetry allows you to add attributes to your Spans. These attributes can add context to your spans, making them more informative. Here’s how you can add attributes to your Spans:

Span span = tracer.spanBuilder("operationName").startSpan();
try (Scope scope = span.makeCurrent()) {
  span.setAttribute("myAttribute", "myValue");
  // your application code here
} finally {
  span.end();
}

In this example, we add an attribute named myAttribute with a value of myValue to our Span. We do this using the setAttribute method of the Span.

Context Propagation

In distributed systems, you often need to pass context from one service to another, such as tracing requests as they travel through various services.

Here’s an example of how you can propagate context. Note that for this example to work, you’ll need to define the Scope and extractedContext classes and add relevant import statements.

TextMapPropagator.Getter<HttpRequest> getter =
  new TextMapPropagator.Getter<HttpRequest>() {
    @Override
    public String get(HttpRequest carrier, String key) {
      return carrier.getHeader(key);
    }
  };

Context extractedContext = OpenTelemetry.getPropagators()
  .getTextMapPropagator()
  .extract(Context.current(), request, getter);

try (Scope scope = extractedContext.makeCurrent()) {
  // your application code here
}

In this example, we first create a Getter that extracts values from an HttpRequest. We then use this Getter to extract the context from an incoming request. This extracted context is then made current for the block of code it contains.

Best Practices for Working with Traces in OpenTelemetry 

Capture Information About Events and Errors

When working with traces in OpenTelemetry, it’s important to capture as much information as possible about events and errors. This can help you better understand your application’s behavior and identify potential issues.

Capturing detailed information about events can provide valuable insights into the execution flow of your application. This includes information such as the timing of the event, its associated attributes, and any related Spans.

Always Use Start and End Times

Another best practice when working with OpenTelemetry tracing is to always use start and end times for your Spans. This allows you to accurately measure the duration of each operation and identify potential performance issues.

Start and end times should be captured with as much precision as possible, as small differences in timing can significantly impact your application’s performance. This can be especially important in distributed systems, where operations are often performed concurrently across multiple services.

By accurately capturing start and end times, you can gain a detailed view of your application’s execution flow and identify areas for optimization.

Use Semantic Conventions as a Simplified Language

OpenTelemetry provides a set of semantic conventions that define a common language for describing common types of operations. These conventions provide a standardized way to name Spans, define their attributes, and associate them with events.

Using semantic conventions can greatly simplify the process of working with traces in OpenTelemetry. It allows you to describe your operations in a way that is easy to understand and analyze, regardless of the specific details of your application.

Semantic conventions also enable interoperability between different tracing tools and systems. This means that you can use OpenTelemetry tracing alongside other tools and still be able to understand and analyze your traces effectively.

Use Sampling to Control Data Volumes

Tracing can generate large amounts of data, especially in large, distributed systems. This can lead to high storage costs and make it more difficult to analyze your traces. To mitigate this, OpenTelemetry provides a feature called sampling, which allows you to control the volume of collected data.

Sampling allows you to choose which traces to collect and which to discard based on various factors such as the operation type, the service, or the request rate. This can help you focus on the most important traces and reduce the collected data volume.

However, it’s important to use sampling carefully, as discarding traces can potentially lead to the loss of important information. You should carefully consider your sampling strategy to ensure you capture all necessary information while controlling data volumes.

Leverage Resource Attributes

OpenTelemetry also provides resource attributes, which allows you to associate additional information with your traces. This can include information about the service, the environment, or the hardware on which your application is running.

Resource attributes can provide valuable context for your traces, helping you understand the conditions under which your application is running. This can be especially useful in distributed systems, where operations can be performed across different services and environments.

Microservices Monitoring with Lumigo

Lumigo is cloud native observability tool that provides automated distributed tracing of microservice applications and supports OpenTelemetry for reporting of tracing data and resources. With Lumigo, users can:

  • See the end-to-end path of a transaction and full system map of applications
  • Monitor and debug third-party APIs and managed services (ex. Amazon DynamoDB, Twilio, Stripe)
  • Go from alert to root cause analysis in one click
  • Understand system behavior and explore performance and cost issues 
  • Group services into business contexts

Get started with a free trial of Lumigo for your microservice applications 

Debug fast and move on.

  • Resolve issues 3x faster
  • Reduce error rate
  • Speed up development
No code, 5-minute set up
Start debugging free