Monitoring services | On‑Premise | Urbi Documentation

Monitoring services

On-Premise services provide metrics in Prometheus format. This format ensures secure data transmission (via HTTPS or protected endpoints) and is easily integrated with most modern monitoring systems.

Below are recommendations for configuring service monitoring.

Two main monitoring methods are recommended to use:

  • For APIs and services in Kubernetes - the RED method:

    • Rate (Request rate)
    • Errors
    • Duration
  • For databases and storage on virtual machines - the USE method:

    • Utilization (Resource usage)
    • Saturation
    • Errors

For a complete monitoring cycle (collection, storage, visualization, and alerting), using the following tools is recommended:

  • Metrics collection and storage: Prometheus - provides flexible queries (PromQL language) and reliable data storage.
  • Visualization: Grafana - offers dashboards with customizable queries.
  • Alerting: AlertManager - allows sending notifications via e-mail, Slack, and other channels when thresholds are exceeded.

Collecting metrics directly from service pods is recommended. If your Kubernetes cluster uses Prometheus with Kubernetes SD (Service Discovery), add the following annotations to the podAnnotations parameter in the Helm charts of the services:

  • prometheus.io/scrape: "true" - enable metrics collection from this pod.
  • prometheus.io/path: "/metrics" - the path where metrics are available.
  • prometheus.io/port: "80" - pod port.

The following tools are recommended for collecting metrics:

When monitoring API services in Kubernetes, consider the following key indicators, which help assess system performance and stability and detect potential issues:

  • Network indicators:

    • RPS (Requests Per Second) - the number of requests per second passing through Ingress or Service. It is important to collect metrics specifically from the Ingress controller, as it shows the total incoming traffic.

    • Latency - request processing latency:

      • p50 - median.
      • p90 - 90th percentile.
      • p99 - 99th percentile. It is important to monitor p99 growth, as it indicates the most critical delays.
    • HTTP response codes:

      • 2xx - successful requests.
      • 4xx - client errors. It is important to monitor increases in 429 errors (Too Many Requests).
      • 5xx - server errors. The number of these errors should be minimal.
  • Container resources (CPU and RAM):

    • CPU Usage (%) - actual CPU resource consumption by the container.
    • CPU Throttling - if the container exceeds CPU limits (limits.cpu), Kubernetes restricts its execution.
    • Memory Usage (RAM, MB/GB) - current memory usage.
    • OOMKills (Out of Memory Kills) - if the container exceeds the memory limit (limits.memory), Kubernetes may terminate it.
  • Node resources:

    • Total CPU and memory load on the node - if the node is overloaded, pods may experience issues.
    • Disk utilization - disk space usage and disk performance (IOPS, read/write speed).
  • Network errors and connection errors:

    • Connection Errors - errors in connections between services (e.g., Connection reset, Timeout).
    • DNS Latency & Failures - delays and errors in DNS requests (CoreDNS).