Grafana Tempo

This page describes how to configure Grafana Tempo for Kiali.

Grafana Tempo Configuration

There are two possibilities to integrate Kiali with Grafana Tempo:

Using the Grafana Tempo API

There are two steps to set up Kiali and Grafana Tempo:

Set up the Kiali CR

This is a configuration example to set up Kiali tracing with Grafana Tempo:

spec:
  external_services:
    tracing:
      # Enabled by default. Kiali will anyway fallback to disabled if
      # Tempo is unreachable.
      enabled: true
      health_check_url: "https://tempo-instance.grafana.net"
      # Tempo service name is "query-frontend" and is in the "tempo" namespace.
      # Make sure the URL you provide corresponds to the non-GRPC enabled endpoint
      # It does not support grpc yet, so make sure "use_grpc" is set to false.
      internal_url: "http://tempo-tempo-query-frontend.tempo.svc.cluster.local:3200/"
      provider: "tempo"
      tempo_config:
        org_id: "1"
        datasource_uid: "a8d2ef1c-d31c-4de5-a90b-e7bc5252cd00"
        url_format: "grafana"
      use_grpc: false
      # Public facing URL of Tempo 
      external_url: "https://tempo-tempo-query-frontend-tempo.apps-crc.testing/"

Kiali will use the external_url to redirect to the Tracing UI, in the “View in tracing” links. In Tempo, by default, the url_format is set to grafana. This will use a particular url path and query for each link. The default UI for Grafana Tempo is Grafana, so it will use the external_url set in the Grafana section, such as this example:

spec:
  external_services:
    grafana:
      enabled: true
      external_url: https://grafana.apps-crc.testing/

It is also possible to set url_format to “jaeger”. In that case, Kiali will use the external_url set in the tracing section, and the url path and query will be following the Jaeger UI format.

Set up a Tempo Datasource in Grafana

We can optionally set up a default Tempo datasource in Grafana so that you can view the Tempo tracing data within the Grafana UI, as you see here:

Kiali grafana_tempo

To set up the Tempo datasource, go to the Home menu in the Grafana UI, click Data sources, then click the Add new data source button and select the Tempo data source. You will then be asked to enter some data to configure the new Tempo data source:

Kiali grafana_tempo

The most important values to set up are the following:

  • Mark the data source as default, so the URL that Kiali uses will redirect properly to the Tempo data source.
  • Update the HTTP URL. This is the internal URL of the HTTP tempo frontend service. e.g. http://tempo-tempo-query-frontend.tempo.svc.cluster.local:3200/

Additional configuration

The Traces tab in the Kiali UI will show your traces in a bubble chart:

Kiali grafana_tempo

Increasing performance is achievable by enabling gRPC access, specifically for query searches. However, accessing the HTTP API will still be necessary to gather information about individual traces. This is an example to configure the gRPC access:

spec:
  external_services:
    tracing:
      enabled: true
      # grpc port defaults to 9095
      grpc_port: 9095 
      internal_url: "http://query-frontend.tempo:3200"
      provider: "tempo"
      use_grpc: true
      external_url: "http://my-tempo-host:3200"
Service check URL

By default, Kiali will check the service health in the endpoint /status/services, but sometimes, this is exposed in a different url, which can lead to a component unreachable message:

component_unreachable

This can be changed with the health_check_url configuration option.

spec:
  external_services:
    tracing:
      health_check_url: "http://query-frontend.tempo:3200"
Configuration for the Grafana Tempo Datasource

In order to correctly redirect Kiali to the right Grafana Tempo Datasource, there are a couple of configuration options to update:

spec:
  external_services:
    tracing:
      tempo_config:
        org_id: "1"
        datasource_uid: "a8d2ef1c-d31c-4de5-a90b-e7bc5252cd00"

org_id is usually not needed since “1” is the default value which is also Tempo’s default org id. The datasource_uid needs to be updated in order to redirect to the right datasource in Grafana versions 10 or higher.

Using the Jaeger frontend with Grafana Tempo tracing backend

It is possible to use the Grafana Tempo tracing backend exposing the Jaeger API. tempo-query is a Jaeger storage plugin. It accepts the full Jaeger query API and translates these requests into Tempo queries.

Since Tempo is not yet part of the built-in addons that are part of Istio, you need to manage your Tempo instance.

Tanka

The official Grafana Tempo documentation explains how to deploy a Tempo instance using Tanka. You will need to tweak the settings from the default Tanka configuration to:

  • Expose the Zipkin collector
  • Expose the GRPC Jaeger Query port

When the Tempo instance is deployed with the needed configurations, you have to set meshConfig.defaultConfig.tracing.zipkin.address from Istio to the Tempo Distributor service and the Zipkin port. Tanka will deploy the service in distributor.tempo.svc.cluster.local:9411.

The external_services.tracing.internal_url Kiali option needs to be set to: http://query-frontend.tempo.svc.cluster.local:16685.

Tempo Operator

The Tempo Operator for Kubernetes provides a native Kubernetes solution to deploy Tempo easily in your system.

After installing the Tempo Operator in your cluster, you can create a new Tempo instance with the following CR:

kubectl create namespace tempo
kubectl apply -n tempo -f - <<EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: smm
spec:
  storageSize: 1Gi
  storage:
    secret:
      type: s3
      name: object-storage
  template:
    queryFrontend:
      component:
        resources:
          limits:
            cpu: "2"
            memory: 2Gi
      jaegerQuery:
        enabled: true
        ingress:
          type: ingress
EOF

Note the name of the bucket where the traces will be stored in our example is called object-storage. Check the Tempo Operator documentation to know more about what storages are supported and how to create the secret properly to provide it to your Tempo instance.

Now, you are ready to configure the meshConfig.defaultConfig.tracing.zipkin.address field in your Istio installation. It needs to be set to the 9411 port of the Tempo Distributor service. For the previous example, this value will be tempo-smm-distributor.tempo.svc.cluster.local:9411.

Now, you need to configure the internal_url setting from Kiali to access the Jaeger API. You can point to the 16685 port to use GRPC or 16686 if not. For the given example, the value would be http://tempo-ssm-query-frontend.tempo.svc.cluster.local:16685.

There is a related tutorial with detailed instructions to setup Kiali and Grafana Tempo with the Operator.

Configuration table

Supported versions

Kiali Version
Jaeger
Tempo
Tempo with JaegerQuery
<= 1.79 (OSSM 2.5)
> 1.79

Minimal configuration for Kiali <= 1.79

In external_services.tracing

http
grpc
Jaeger .internal_url = 'http://jaeger_service_url:16686/jaeger'
.use_grpc = false
.internal_url = 'http://jaeger_service_url:16685/jaeger'
.use_grpc = true (Not required: by default)

Tempo .internal_url = 'http://query_frontend_url:16686'
.use_grpc = false
.internal_url = 'http://query_frontend_url:16685'
.use_grpc = true (Not required: by default)


Minimal configuration for Kiali > 1.79

http
grpc
Jaeger .internal_url = 'http://jaeger_service_url:16686/jaeger'
.use_grpc = false
.internal_url = 'http://jaeger_service_url:16685/jaeger'
.use_grpc = true (Not required: by default)

Tempo
internal_url = 'http://query_frontend_url:3200'
.use_grpc = false
.provider = 'tempo'

.internal_url = 'http://query_frontend_url:3200'
.grpc_port: 9095
.provider: 'tempo'
.use_grpc = true (Not required: by default)

Tempo tuning

Resources consumption

Grafana Tempo is a powerful tool, but it can lead to performance issues when not configured correctly. For example, the following configuration is not recommended and may lead to OOM issues for simple queries in the query-frontend component:

spec:
  resources:
    total:
      limits:
        memory: 2Gi
        cpu: 2000m

These resources are shared between all the Tempo components. When needed, apply resources to each specific component, instead of applying the resources globally:

spec:
  template:
    queryFrontend:
      component:
        resources:
          limits:
            cpu: "2"
            memory: 2Gi

This Grafana Dashboard is available to measure the resources used in the tempo namespace.

Caching

Tempo offers multi-level caching that is used by default with Tanka and Helm deployment examples. It uses external cache, supporting Memcached and Redis. The lower level cache has a higher hit rate, and caches bloom filters and parquet data. The higher level caches frontend-search data.

Optimizing the cache depends on the application usage, and can be done modifying different parameters:

  • Connection limit for MemCached: Should be increased in large deployments, as MemCached is set to 1024 by default.
  • Cache size control: Should be increased when the working set is larger than the size of cache.

Tune search pipeline

There are many parameters to tune the search pipeline, some of these:

  • max_concurrent_queries: If it is too high it can cause OOM.
  • concurrent_jobs: How many jobs are done concurrently.
  • max_retries: When it is too high it can result in a lot of load.

Dedicated attribute columns

When using the vParquet3 storage format , defining dedicated attribute columns can improve the query performance. In order to best choose those columns (Up to 10), a good criteria is to choose attributes that contribute growing the block size (And not those commonly used).

Tempo authentication configuration

The Kiali CR provides authentication configuration that will be used also for querying the version check to provide information in the Mesh graph.

spec:
  external_services:
    tracing:
      enabled: true
      auth:
        ca_file: ""
        insecure_skip_verify: false
        password: "pwd"
        token: ""
        type: "basic"
        use_kiali_token: false
        username: "user"
      health_check_url: ""

To configure a secret to be used as a password, see this FAQ entry

Tempo cache

Kiali 2.2 includes a simple tracing cache for Tempo that stores the last N traces. By default, it is enabled and it keeps the last 200 traces. It can be modified in the Kiali CR with:

spec:
  external_services:
    tracing:
      enabled: true
      tempo_config:
        cache_enabled: true
        cache_capacity: 200

Kiali emits some cache metrics. The following query obtains the cache hit rate:

(sum(kiali_cache_hits_total{name="tempo"})/sum(kiali_cache_requests_total{name="tempo"})) * 100

tempo_metrics_cache

Last modified December 4, 2024: Tempo cache docs (#840) (84cd739)