Kiali: Service mesh observability and configuration

Runtimes Monitoring

1. Prometheus Configuration
2. Pods Annotations and Auto-discovery
3. Default dashboards
4. Create new dashboards
5. Units

Kiali can display custom dashboards to monitor application metrics. They are available for Applications and Workloads.

To display custom dashboards, you must set the app and version labels on your pods. These labels are necessary for Kiali to identify which application and workload the metrics originate from.

1. Prometheus Configuration

Kiali runtimes monitoring feature works exclusively with Prometheus, so it must be configured correctly to pull your application metrics.

If you are using the default Istio installation, your Prometheus instance should already be configured as shown below and you can skip to the next section.

But if you want to use another Prometheus instance, please refer to Prometheus documentation to setup kubernetes_sd_config for pods. As a reference, here is how it is configured in Istio.

It is important to preserve label mapping, so that Kiali can filter on app and version, which is done with this configuration:

      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)

Additionally, you must configure the Prometheus server URL in the Kiali CR:

# ...
external_services:
  prometheus:
    custom_metrics_url: URL_TO_SERVER
# ...

Make sure that Kiali pod can reach this URL.

2. Pods Annotations and Auto-discovery

Application pods must be annotated for the Prometheus scraper, for example, within a Deployment definition:

spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/scheme: http
        prometheus.io/path: "/metrics"

prometheus.io/scrape tells Prometheus to fetch these metrics or not
prometheus.io/port is the port under which metrics are exposed
prometheus.io/scheme must be set to http for non-secure or https for secure connection
prometheus.io/path is the endpoint path where metrics are exposed, default is /metrics

Kiali will try to discover automatically dashboards that are relevant for a given Application or Workload. To do so, it reads their metrics and try to match them with the discoverOn field defined on dashboards.

But if you can’t rely on automatic discovery, you can explicitly annotate the pods to associate them with Kiali dashboards.

spec:
  template:
    metadata:
      annotations:
        # (prometheus annotations...)
        kiali.io/runtimes: vertx-server

kiali.io/runtimes is a coma-separated list of runtimes / dashboards that Kiali will look for

3. Default dashboards

Since version 0.15, Kiali comes with a set of default dashboards for various runtimes. We keep adding new ones over time.

3.1. Go

Contains metrics such as the number of threads, goroutines, and heap usage. The expected metrics are provided by the Prometheus Go client.

Example to expose default Go metrics:

        http.Handle("/metrics", promhttp.Handler())
        http.ListenAndServe(":2112", nil)

As an example and for self-monitoring purpose Kiali itself exposes Go metrics.

The pod annotation for Kiali is: kiali.io/runtimes: go

3.2. Envoy

Istio’s Envoy sidecars supply some internal metrics, that can be viewed in Kiali. They are different than the metrics reported by Istio Telemetry, which Kiali uses extensively. Some of Envoy’s metrics may be redundant.

Unlike the other Runtimes dashboards, there is no automatic discovery configured for Envoy. You must explicitly enable the Envoy dashboard with the pod annotation kiali.io/runtimes: envoy.

Note that the enabled Envoy metrics can be tuned, as explained in the Istio documentation: it’s possible to get more metrics using the statsInclusionPrefixes annotation. Make sure you include cluster_manager and listener_manager as they are required.

For example, sidecar.istio.io/statsInclusionPrefixes: cluster_manager,listener_manager,listener will add listener metrics for more inbound traffic information. You can then customize the Envoy dashboard of Kiali according to the collected metrics.

3.3. Node.js

Contains metrics such as active handles, event loop lag, and heap usage. The expected metrics are provided by prom-client.

Example of Node.js metrics for Prometheus:

const client = require('prom-client');
client.collectDefaultMetrics();
// ...
app.get('/metrics', (request, response) => {
  response.set('Content-Type', client.register.contentType);
  response.send(client.register.metrics());
});

Full working example: https://github.com/jotak/bookinfo-runtimes/tree/master/ratings

The pod annotation for Kiali is: kiali.io/runtimes: nodejs

3.4. Quarkus

Contains JVM-related, GC usage metrics. The expected metrics can be provided by SmallRye Metrics, a MicroProfile Metrics implementation. Example with maven:

    <dependency>
      <groupId>io.quarkus</groupId>
      <artifactId>quarkus-smallrye-metrics</artifactId>
    </dependency>

The pod annotation for Kiali is: kiali.io/runtimes: quarkus

3.5. Spring Boot

Three dashboards are provided: one for JVM memory / threads, another for JVM buffer pools and the last one for Tomcat metrics. The expected metrics come from Spring Boot Actuator for Prometheus. Example with maven:

    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
      <groupId>io.micrometer</groupId>
      <artifactId>micrometer-core</artifactId>
    </dependency>
    <dependency>
      <groupId>io.micrometer</groupId>
      <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>

Full working example: https://github.com/jotak/bookinfo-runtimes/tree/master/details

The pod annotation for Kiali with the full list of dashboards is: kiali.io/runtimes: springboot-jvm,springboot-jvm-pool,springboot-tomcat

By default, the metrics are exposed on path /actuator/prometheus, so it must be specified in the corresponding annotation: prometheus.io/path: "/actuator/prometheus"

3.6. Thorntail

Contains mostly JVM-related metrics such as loaded classes count, memory usage, etc. The expected metrics are provided by the MicroProfile Metrics module. Example with maven:

    <dependency>
      <groupId>io.thorntail</groupId>
      <artifactId>microprofile-metrics</artifactId>
    </dependency>

Full working example: https://github.com/jotak/bookinfo-runtimes/tree/master/productpage

The pod annotation for Kiali is: kiali.io/runtimes: thorntail

3.7. Vert.x

Several dashboards are provided, related to different components in Vert.x: HTTP client/server metrics, Net client/server metrics, Pools usage, Eventbus metrics and JVM. The expected metrics are provided by the vertx-micrometer-metrics module. Example with maven:

    <dependency>
      <groupId>io.vertx</groupId>
      <artifactId>vertx-micrometer-metrics</artifactId>
    </dependency>
    <dependency>
      <groupId>io.micrometer</groupId>
      <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>

Init example of Vert.x metrics, starting a dedicated server (other options are possible):

      VertxOptions opts = new VertxOptions().setMetricsOptions(new MicrometerMetricsOptions()
          .setPrometheusOptions(new VertxPrometheusOptions()
              .setStartEmbeddedServer(true)
              .setEmbeddedServerOptions(new HttpServerOptions().setPort(9090))
              .setPublishQuantiles(true)
              .setEnabled(true))
          .setEnabled(true));

Full working example: https://github.com/jotak/bookinfo-runtimes/tree/master/reviews

The pod annotation for Kiali with the full list of dashboards is: kiali.io/runtimes: vertx-client,vertx-server,vertx-eventbus,vertx-pool,vertx-jvm

4. Create new dashboards

The default dashboards described above are just examples of what we can have. It’s pretty easy to create new dashboards.

When installing Kiali, a new CRD is installed in the system: monitoringdashboard.monitoring.kiali.io. It declares the resource kind MonitoringDashboard. Here’s what this resource looks like:

apiVersion: "monitoring.kiali.io/v1alpha1"
kind: MonitoringDashboard
metadata:
  name: vertx-custom
spec:
  runtime: Vert.x
  title: Vert.x Metrics
  discoverOn: "vertx_http_server_connections"
  items:
  - chart:
      name: "Server response time"
      unit: "seconds"
      spans: 6
      metricName: "vertx_http_server_responseTime_seconds"
      dataType: "histogram"
      aggregations:
      - label: "path"
        displayName: "Path"
      - label: "method"
        displayName: "Method"
  - chart:
      name: "Server active connections"
      unit: ""
      spans: 6
      metricName: "vertx_http_server_connections"
      dataType: "raw"
  - include: "micrometer-1.1-jvm"

The name field (from metadata) corresponds to what you can set in pods annotation kiali.io/runtimes.

Spec fields definitions are:

runtime: name of the related runtime. It will be displayed on the corresponding Workload Details page. If omitted no name is displayed.
title: dashboard title, displayed as a tab in Application or Workloads Details
discoverOn: metric name to match for auto-discovery. If omitted, the dashboard won’t be discovered automatically, but can still be used via pods annotation.
items: can be either chart, to define a new chart, or include to reference another dashboard
- chart: new chart object
  - name: name of the chart
  - chartType: type of the chart, can be one of line (default), area or bar
  - unit: unit for Y-axis. Free-text field to provide any unit suffix. It can eventually be scaled on display. See specific section below.
  - spans: number of "spans" taken by the chart, from 1 to 12, using bootstrap convention
  - metricName: the metric name in Prometheus
  - dataType: type of data to be displayed in the chart. Can be one of raw, rate or histogram. Raw data will be queried without transformation. Rate data will be queried using promQL rate() function. And histogram with histogram_quantile() function.
  - min and max: domain for Y-values. When unset, charts implementations should usually automatically adapt the domain with the displayed data.
  - aggregator: defines how the time-series are aggregated when several are returned for a given metric and label set. For example, if a Deployment creates a ReplicaSet of several Pods, you will have at least one time-series per Pod. Since Kiali shows the dashboards at the workload (ReplicaSet) level or at the application level, they will have to be aggregated. This field can be used to fix the aggregator, with values such as sum or avg (full list available in Prometheus documentation). However, if omitted the aggregator will default to sum and can be changed from the dashboard UI.
  - aggregations: list of labels eligible for aggregations / groupings (they will be displayed in Kiali through a dropdown list)
    
    label: Prometheus label name
    
    displayName: Name to display in Kiali
- include: to include another dashboard, or a specific chart from another dashboard. Typically used to compose with generic dashboards such as the ones about MicroProfile Metrics or Micrometer-based JVM metrics. To reference a full dashboard, set the name of that dashboard. To reference a specific chart of another dashboard, set the name of the dashboard followed by $ and the name of the chart (ex: include: "microprofile-1.1$Thread count").
externalLinks: a list of related external links (e.g. to Grafana dashboards)
- name: link name to be displayed
- type: link type, currently only grafana is allowed
- variables: a set of variables that can be injected in the URL. For instance, with something like namespace: var-namespace and app: var-app, an URL to a Grafana dashboard that manages namespace and app variables would look like: http://grafana-server:3000/d/xyz/my-grafana-dashboard?var-namespace=some-namespace&var-app=some-app. The available variables in this context are namespace, app and version.

Label clash: you should try to avoid labels clashes within a dashboard. In Kiali, labels for grouping are aggregated in the top toolbar, so if the same label refers to different things depending on the metric, you wouldn’t be able to distinguish them in the UI. For that reason, ideally, labels should not have too generic names in Prometheus. For instance labels named "id" for both memory spaces and buffer pools would better be named "space_id" and "pool_id". If you have control on label names, it’s an important aspect to take into consideration. Else, it is up to you to organize dashboards with that in mind, eventually splitting them into smaller ones to resolve clashes.

Dashboard resources are added in Kubernetes just like any other resource:

kubectl apply -f mydashboard.yml

Or for OpenShift:

oc apply -f mydashboard.yml

To make the dashboard resources available cluster-wide, just create them in Kiali namespace (usually istio-system). Else, they will be available only for applications or workloads of the same namespace. In the case where the same dashboard name exists in a specific namespace and in Kiali namespace, the former takes precedence.

5. Units

Some units are recognized in Kiali and scaled appropriately when displayed on charts:

unit: "seconds" can be scaled down to ms, µs, etc.
unit: "bytes-si" and unit: "bitrate-si" can be scaled up to kB, MB (etc.) using SI / metric system. The aliases unit: "bytes" and unit: "bitrate" can be used instead.
unit: "bytes-iec" and unit: "bitrate-iec" can be scaled up to KiB, MiB (etc.) using IEC standard / IEEE 1541-2002 (scale by powers of 2).

Other units will fall into the default case and be scaled using SI standard. For instance, unit: "m" for meter can be scaled up to km.