OpenTelemetry Collector: Architecture, Installation, and Debugging

  • Topics

What Is OpenTelemetry Collector?

The OpenTelemetry collector facilitates vendor-agnostic telemetry data collection, processing, and exporting. It lets you use one agent to collect and forward multiple types of telemetry in a variety of formats instead of running, operating, and maintaining several agents.

The collector improves scalability and flexibility, supporting various open source observability data formats, including Jaeger, Fluent Bit, and Prometheus. It can send information to one or more monitoring and distributed tracing back-ends, including open-source and commercial tools. Additionally, it provides a local collector agent as a default location for instrumentation libraries to export telemetry data.

In this article:

  • Why You Should Use the OpenTelemetry Collector
  • OpenTelemetry Collector Architecture
    • Receivers
    • Processors
    • Exporters
    • The OpenTelemetry Pipeline
    • Extensions
    • Service
  • Installing the OpenTelemetry Collector
    • Kubernetes
    • Linux
    • Windows
    • Local
  • Debugging the OpenTelemetry Collector
    • Logs
    • Metrics
    • Local Exporters
    • Health Check
  • Opentelemetry Collector with Lumigo

Why You Should Use the OpenTelemetry Collector

The OpenTelemetry collector is in charge of collecting, processing, and exporting collected telemetry data. It is vendor-agnostic and provides capabilities for cost reduction and easy management. Some key capabilities of the OpenTelemetry collector are:

  • Reduce the need to reboot applications—making configuration changes to OpenTelemetry clients requires rebooting application processes. You can mitigate this issue by running OpenTelemetry clients pointing to a collector. As a result, configuration changes to the observability pipeline reboot the collector instead of the application process.
  • Prevent resource contention—when you install data processing and transformation plugins, clients will consume system resources inside the applications. You can more easily manage telemetry at scale by running a pool of data-processing collectors on separate machines or containers on the same private network, and thus have a predictable amount of computing resources allocated.
  • Avoid hanging on shutdown—when networking problems occur, flushing the remaining telemetry from an application to a remote backend can slow down or hang on shutdown. It is safer and more resilient, instead, to flush the data to a collector that runs on a local network to ensure that the data leaves the application quickly.

OpenTelemetry Collector Architecture

The OpenTelemetry collector has a modular architecture that allows it to ingest, collect, transform and export a variety of telemetry formats.

Image Source: Opentelemetry

The Collector is made up of the following components:

Receivers

The receiver enters data into the collector. There are two types of receivers: push-based and pull-based. A receiver typically receives data in a specified format, converts it to an internal format, and passes it to processors and exporters defined in the OpenTelemetry pipeline. Traces and metrics can have a receiver-specific format.

Processors

Processors are used to process data before sending it to export. They can transform metrics and rename spans. They also support batching data before sending, retrying if an export fails, adding metadata, and performing tail-based sampling.

Exporters

Exporters can export data to several open source and commercial back-ends. For example, a console exporter makes it possible to export log data to console output, while a file export can dump data to a file.

The OpenTelemetry Pipeline

The preceding three components—receivers, processors, and exporters—make up the Open Telemetry Pipeline, which define how telemetry data is collected and handled.

In addition to pipeline components, there are two other components that assist in data handling and communication.

Extensions

Extensions are optional, and provide additional functionality not supported by the default collector. They do not need direct access to telemetry data. Three extensions provided by OpenTelemetry are health_check, pprof, and zpages (learn more in the documentation). You can also create your own extensions.

Service

Services are used to enable components within receivers, processors, exports, and extensions. The service section of the configuration consists of two subsections: extensions and pipes.

  • The extensions section contains a list of all extensions you want to activate.
  • The pipeline section can define traces, metrics, or logs. Each pipeline consists of a set of receivers, processors, and exports. Each of these must first be defined in configuration sections outside the services section, and can then be referenced in this section to be included in a pipeline.

Related content: Read our guide to OpenTelemetry architecture

Installing the OpenTelemetry Collector

The OpenTelemetry Collector offers one binary and two deployment options:

  • Agents—one or several Collector instances running on the same host as an application or alongside it (binaries, sidecars, daemon sets, etc.).
  • Gateway—contains one or several Collector instances, typically running as separate services (e.g., containers or deployments) for each cluster, region, or datacenter.

There are two Collector versions available from the OpenTelemetry project:

  • Core—includes basic components (e.g., configuration, universally applicable receivers, exporters, processors, extensions). It supports popular open source projects such as Jaeger, FluentBit, and Prometheus.
  • Contribution—includes all components of the core version with the addition of optional or experimental components. It also includes specialized and vendor-specific receivers, exporters, processors, and extensions.

Kubernetes

Kubernetes can deploy agents as daemon sets and or one gateway instance:

$ kubectl apply -f https://raw.githubusercontent.com/open-telemetry/opentelemetry-collector/main/examples/k8s/otel-config.yaml

This example above can provide the starting point for extending and customizing before use in a live production environment.

You can also use the OpenTelemetry Operator to configure and maintain OpenTelemetry Collector instances using automatic upgrade management, OpenTelemetry configuration-based service configuration, and automatic sidecar insertion into a deployment.

Linux

All Collector releases include RPM and DEB packaging for Linux amd64 and arm64 systems. This packaging contains the default configuration retained at /etc/otelcol/config.yaml after installation.

Windows

Windows packages its releases as .tar.gz (gzipped tarballs). You need to unpack them using a tool that supports this format.

All Collector versions include an executable otelcol.exe that you can run after unpacking.

Local

You can build the latest version of the Collector based on your local operating system, enable all receivers when running the binary, and export any data received locally to a file. The Collector sends all data to a container, scraping its Prometheus metrics. The following example illustrates the Collector using two terminal windows. Run the following script in the first terminal window:

$ git clone https://github.com/open-telemetry/opentelemetry-collector.git
$ cd opentelemetry-collector
$ make install-tools
$ make otelcorecol
$ ./bin/otelcorecol_* --config ./examples/local/otel-config.yaml

You can test your new Collector in the second terminal window using the following script:

$ git clone https://github.com/open-telemetry/opentelemetry-collector-contrib.git
$ cd opentelemetry-collector-contrib/examples/demo/server
$ go build -o main main.go; ./main & pid1="$!"
$ cd ../client
$ go build -o main main.go; ./main

You can stop the client using the Ctrl-c command. Stop the server using the kill $pid1 command. You can also stop the Collector using the Ctrl-c command in the relevant terminal window.

Debugging the OpenTelemetry Collector

There are four main ways you can check the health of the Collector and investigate issues: logs, metrics, local exporters, and health checks.

Logs

Collector logs are the first place you should look to identify and resolve issues. The default log verbosity level is INFO—you can change this as needed.

To define the log level, set the service::telemetry::logs property:

service:
  telemetry:
    logs:
      level: "debug"

Metrics

The OpenTelemetry Container automatically exposes Prometheus metrics on port 8888 using the path /metrics. In a containerized environment, it makes sense to expose this port as a public interface to ensure you can troubleshoot even if not connected directly to the relevant host.

You can customize the port using the service::telemetry::metrics property:

service:
  telemetry:
    metrics:
      address: ":8888"

Local Exporters

If necessary, you can define a local exporter to inspect data being processed by the Collector. For example, here is how to use the file exporter, to confirm that data is being correctly received and processed:

receivers:
  otlp:
    http:
    grpc:
exporters:
  file:
    path: spans.json
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: []
      exporters: [file]

Health Check

The OpenTelemetry Collector provides the health_check extension, which is available by default on port 13133 when OpenTelemetry is installed. You can use health checks to identify if a Collector is working. Here is how to enable a health check:

extensions:
  health_check:
service:
  extensions: [health_check]

The response should look like this:

{
  "status": "Server available",
  "upSince": "2022-04-28T05:33:87.8769865Z",
  "uptime": "22.3139965s"
}

Microservices Monitoring with Lumigo

Lumigo is a cloud native observability tool, purpose-built to navigate the complexities of microservices. Through automated distributed tracing, Lumigo is able to stitch together the distributed components of an application in one complete view, and track every service of every request.

Taking an agentless approach to monitoring, Lumigo sees through the black boxes of third parties, APIs and managed services.

With Lumigo users can:

  • See the end-to-end path of a transaction and full system map of applications
  • Monitor and debug third party APIs and managed services (ex. Amazon DynamoDB, Twilio, Stripe)
  • Go from alert to root cause analysis in one click
  • Understand system behavior and explore performance and cost issues
  • Group services into business contexts

Get started with a free trial of Lumigo for your microservice applications.