Kiali Logo

Health Configuration

1. Health

Kiali calculates health by combining the individual health of several indicators, such as pods and request traffic. The global health of a resource reflects the most severe health of its indicators.

1.1. Health Indicators

The table below lists the current health indicators and whether the indicator supports custom configuration for its health calculation.

Indicator Supports Configuration
Pod StatusNo
Traffic HealthYes

1.2. Icons and colors

Kiali use icons and colors to indicate the health of resources and associated request traffic.

  • No Health Information (NA)
  • Healthy
  • Degraded
  • Failure

1.3. Default Values

1.3.1. Request Traffic

By default Kiali uses the traffic rate configuration shown below. Application errors have minimal tolerance while client errors have a higher tolerance reflecting that some level of client errors is often normal (e.g. 404 Not Found):

  • For http protocol 4xx are client errors and 5xx codes are application errors.

  • For grpc protocol all 1-16 are errors (0 is success).

So, for example, if the rate of application errors is >= 0.1% kiali will show Degraded health and if > 10% will show Failure health.

# ...
  health_config:
    rate:
      - namespace: ".*"
        kind: ".*"
        name: ".*"
        tolerance:
          - code: "^5\\d\\d$"
            direction: ".*"
            protocol: "http"
            degraded: 0
            failure: 10
          - code: "^4\\d\\d$"
            direction: ".*"
            protocol: "http"
            degraded: 10
            failure: 20
          - code: "^[1-9]$|^1[0-6]$"
            direction: ".*"
            protocol: "grpc"
            degraded: 0
            failure: 10
# ...

1.4. Configuration

Custom health configuration is specified in the Kiali CR. To see the supported configuration syntax for health_config visit Kiali CR.

Kiali applies the first matching rate configuration (namespace, kind, etc) and calculates the status for each tolerance. The reported health will be the status with highest priority (see below).

Rate OptionDefinitionDefault
namespaceMatching Namespaces (regex)".*" (match all)
kindMatching Resource Types (workload|app|service) (regex)".*" (match all)
nameMatching Resource Names (regex)".*" (match all)
ToleranceArray of tolerances to apply.
Tolerance Option Definition Default
code Matching Response Status Codes (regex) [1] required
direction Matching Request Directions (inbound|outbound) (regex) ".*" (match all)
protocol Matching Request Protocols (http|grpc) (regex) ".*" (match all)
degraded Degraded Threshold(% matching requests >= value) 0
failure Failure Threshold (% matching requests >= value) 0

[1] The status code typically depends on the request protocol. The special code -, a single dash, is used for requests that don’t receive a response, and therefore no response code.

Kiali reports traffic health with the following top-down status priority :

Priority Rule (value=% matching requests) Status
1 value >= FAILURE threshold FAILURE
2 value >= DEGRADED threshold AND value < FAILURE threshold DEGRADED
3 value > 0 AND value < DEGRADED threshold HEALTHY
4 value = 0 HEALTHY
5 No traffic No Health Information

2. Examples

In this repo we can see 2 namespaces: alpha and beta (Demo design).

Alpha

Where nodes return the responses (You can configure responses here ):

AppAlpha/Beta
Code Rate
x-server 200 9
404 1
y-server 200 9
500 1
z-server 200 8
201 1
202 1

The applied traffic rate configuration is:

# ...
health_config:
  rate:
   - namespace: "alpha"
     tolerance:
       - code: "404"
         failure: 10
         protocol: "http"
       - code: "[45]\\d[^\\D4]"
         protocol: "http"
   - namespace: "beta"
     tolerance:
       - code: "[4]\\d\\d"
         degraded: 30
         failure: 40
         protocol: "http"
       - code: "[5]\\d\\d"
         protocol: "http"
# ...

After Kiali adds default configuration we have the following (Debug Info Kiali):

{
  "healthConfig": {
    "rate": [
      {
        "namespace": "/alpha/",
        "kind": "/.*/",
        "name": "/.*/",
        "tolerance": [
          {
            "code": "/404/",
            "degraded": 0,
            "failure": 10,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/[45]\\d[^\\D4]/",
            "degraded": 0,
            "failure": 0,
            "protocol": "/http/",
            "direction": "/.*/"
          }
        ]
      },
      {
        "namespace": "/beta/",
        "kind": "/.*/",
        "name": "/.*/",
        "tolerance": [
          {
            "code": "/[4]\\d\\d/",
            "degraded": 30,
            "failure": 40,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/[5]\\d\\d/",
            "degraded": 0,
            "failure": 0,
            "protocol": "/http/",
            "direction": "/.*/"
          }
        ]
      },
      {
        "namespace": "/.*/",
        "kind": "/.*/",
        "name": "/.*/",
        "tolerance": [
          {
            "code": "/^5\\d\\d$/",
            "degraded": 0,
            "failure": 10,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/^4\\d\\d$/",
            "degraded": 10,
            "failure": 20,
            "protocol": "/http/",
            "direction": "/.*/"
          },
          {
            "code": "/^[1-9]$|^1[0-6]$/",
            "degraded": 0,
            "failure": 10,
            "protocol": "/grpc/",
            "direction": "/.*/"
          }
        ]
      }
    ]
  }
}

What are we applying?

  • For namespace alpha, all resources

  • Protocol http if % requests with error code 404 are >= 10 then FAILURE, if they are > 0 then DEGRADED

  • Protocol http if % requests with others error codes are> 0 then FAILURE.

  • For namespace beta, all resources

  • Protocol http if % requests with error code 4xx are >= 40 then FAILURE, if they are >= 30 then DEGRADED

  • Protocol http if % requests with error code 5xx are > 0 then FAILURE

  • For other namespaces kiali apply defaults.

  • Protocol http if % requests with error code 5xx are >= 20 then FAILURE, if they are >= 0.1 then DEGRADED

  • Protocol grpc if % requests with error code match /[1-9]$|1[0-6]$/ are >= 20 then FAILURE, if they are >= 0.1 then DEGRADED

Alpha Beta