Only registred users can make comments

Guide to Deploying Metrics Server on Linode LKE

TL;DR

There have been many discussions on the Linode LKE forums regarding the installation of the Metrics Server on the platform.

One common issue users face is setting up the insecure TLS and working with outdated Helm charts. This article provides a comprehensive guide to help you overcome these challenges and implement the Metrics Server in your Linode LKE cluster.

What is metrics-server in Kubernetes

The Metrics Server is a tool for collecting and exposing basic resource usage metrics from Kubernetes nodes and pods.

It is required by some capabilities, including Horizontal Pod Autoscaler and Kubernetes Dashboards.

Unlike other monitoring tools such as Prometheus, Metrics Server does not store data long-term or provide advanced querying and alerting capabilities. The data collected is stored in memory and not persisted to disk. This means that the collected metrics are only available as long as the Metrics Server is running, and will be lost if the server is restarted or if the metrics are not retrieved in a timely manner.

However, it is easy to install and configure and has a low resource overhead.

 Learn more about how Metrics Server works and how it gets implemented and its benefits.

Implementation

Helm Chart to use

For the Metrics Server, I recommend the chart from the Kubernetes Special Interest Group (SIG). Maintained by a community of contributors, their projects like ingress-nginx are exceptional.

You can find the Metrics Server chart on the GitHub page: https://github.com/kubernetes-sigs/metrics-server.

By clicking the "Releases" link, you can view the available chart versions.

Another option is to follow the chart releases on Artifact Hub, a central repository of Helm charts: https://artifacthub.io/packages/helm/metrics-server/metrics-server.

There are different ways to implement the Metrics Server in your environment, but I will focus on two methods.

At devoriales.com, we use Kubernetes as the runtime platform for our workloads and leverage CI/CD workflows with Terraform. I'm exposing the resource related to Metrics Server, which you can use as a reference.

Manual helm installation

This method assumes that you have the Helm cli installed on yor machine. It also assumes that you are authenticated and authorized to perform this action.

helm upgrade --install metrics-server bitnami/metrics-server \
--create-namespace --namespace metrics-server \
--set apiService.create=true \
--set 'args={--kubelet-insecure-tls,--kubelet-preferred-address-types=InternalIP}'

Terraform 

Here is an example of the similar implementation via Terraform (this time the metrics server will be implemented in kube-system namespace):

resource "helm_release" "metrics_server" {
  name       = "metrics-server"
  repository = "https://kubernetes-sigs.github.io/metrics-server/"
  chart      = "metrics-server"
  version    = "3.9.0"
  namespace  = "kube-system"

  set {
    name  = "apiService.create"
    value = "true"
  }

  set {
    name  = "args[0]"
    value = "--kubelet-insecure-tls"
  }

  set {
    name  = "args[1]"
    value = "--kubelet-preferred-address-types=InternalIP"
  }
}

Explanation of Arguments

  • apiService.create: A boolean value that determines whether to create the API for the metrics-server service.
    You can verify the metrics-serve API service called v1beta1.metrics.k8s.io has been created: 

    kubectl get apiservice   
    
    NAME                                   SERVICE                      AVAILABLE   AGE
    ...
    v1beta1.metrics.k8s.io                 kube-system/metrics-server   True        47m
    v1beta1.storage.k8s.io                 Local                        True        3h35m
    v1beta2.flowcontrol.apiserver.k8s.io   Local                        True        3h35m
    v1beta3.flowcontrol.apiserver.k8s.io   Local                        True        3h35m
    v2.autoscaling                         Local                        True        3h35m
  • kubelet-insecure-tls: Disables TLS verification between the metrics-server and the kubelet API. This is necessary because Linode currently does not provide signed certificates for internal IP access to nodes. Metrics server needs to access the kubelet API via internal IP, hence the need to disable TLS verification.

Here is an example of why this is needed. Let's check the nodes in the cluster with kubectl get nodes -o wide

kubectl get nodes -o wide
NAME                            STATUS   ROLES    AGE    VERSION   INTERNAL-IP       EXTERNAL-IP       OS-IMAGE                         KERNEL-VERSION          CONTAINER-RUNTIME
lke106889-159726-645753abe0c9   Ready    <none>   124m   v1.26.3   192.168.153.191   172.104.149.137   Debian GNU/Linux 11 (bullseye)   5.10.0-21-cloud-amd64   containerd://1.6.19
lke106889-159726-645753ac3fc9   Ready    <none>   124m   v1.26.3   192.168.167.150   139.162.138.154   Debian GNU/Linux 11 (bullseye)   5.10.0-21-cloud-amd64   containerd://1.6.19
lke106889-159726-645753ac9fb9   Ready    <none>   123m   v1.26.3   192.168.167.32    172.104.225.199   Debian GNU/Linux 11 (bullseye)   5.10.0-21-cloud-amd64   containerd://1.6.19

We will not be able to access the nodes via the hostnames since it's an internal hostname. Instead we can try to use external-ip address.

The following is a health check endpoint for the kubelet API server, which is responsible for managing and interacting with the containers on a Kubernetes node.

curl -k https://172.104.149.137:10250/healthz
Unauthorized%  <<<< we don't have the certificate to authorize

Now we can see that we can reach the node, but since we don't have a valid certificate, we are not authorized.

We can at least check the cert used by kubelet:

Linux:

openssl s_client 172.104.149.137:10250

MacOS:

openssl s_client -connect 172.104.149.137:10250

the ip address is the external ip of the node lke106889-159726-645753abe0c9

 

Output:

CONNECTED(00000003)
depth=1 CN = lke106889-159726-645753abe0c9-ca@1683444762
verify error:num=19:self signed certificate in certificate chain
verify return:0
write W BLOCK
---
Certificate chain
 0 s:/CN=lke106889-159726-645753abe0c9@1683444763
   i:/CN=lke106889-159726-645753abe0c9-ca@1683444762
 1 s:/CN=lke106889-159726-645753abe0c9-ca@1683444762
   i:/CN=lke106889-159726-645753abe0c9-ca@1683444762

As we can see, there is a DNS name of the nodes listed, not IP, but just internal.

That is the reason why the metrics-server needs to run with kubelet-insecure-tls

The certificate used by the metrics-server endpoint does not include the IP address it tries to connect to. This causes the SSL/TLS certificate validation to fail, and as a result, the connection is not established.

--kubelet-insecure-tls flag, tells metrics-server to skip certificate verification for the kubelet API endpoints. By default, Linode does not provide signed certificates for the node's internal IP addresses. I like linode's services, in this case there is a potential for improvement (as of now).

How does Metrics Server collect data?

It is responsible for gathering data through the Kubernetes API server, specifically CPU and memory usage metrics from the Summary API of each Node and Pod.

By exposing these metrics as Kubernetes API resources, Metrics Server periodically requests them from the Kubernetes API server and aggregates the data to provide valuable insights into the cluster's resource utilization. This information is crucial for users to optimize their clusters' performance and improve resource allocation.

Here is an example command to retrieve the Metrics Server metrics using curl:

$ curl -k https://<Kubernetes_API_server>/apis/metrics.k8s.io/v1beta1/nodes

if you want to collect metrics about a specific node:

$ curl -k https://<Kubernetes_API_server>/apis/metrics.k8s.io/v1beta1/nodes/<node_name>

kubectl equivalent

CPU and memory usage for all pods in the cluster:

kubectl top pods --all-namespaces

get CPU and memory usage for all nodes in the cluster:

kubectl top nodes

get CPU and memory usage for a specific pod:

kubectl top pods <pod-name> -n <namespace>

get CPU and memory usage for all pods in the cluster:

kubectl top pods <pod-name> --all-namespaces

How to understand the metrics?

The kubectl top nodes command provides an overview of the CPU and memory usage of all nodes in the Kubernetes cluster. It shows the CPU usage in cores and the memory usage in bytes. This command can be useful to monitor the overall resource utilization of the cluster and identify any nodes that might be experiencing resource constraints.

Output:

kubectl top nodes

NAME                            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
lke106889-159726-645753abe0c9   37m          3%     1110Mi          58%       
lke106889-159726-645753ac3fc9   37m          3%     941Mi           49%       
lke106889-159726-645753ac9fb9   176m         17%    980Mi           52% 
  • CPU(cores): The amount of CPU cores being used by the node
  • CPU%: The percentage of CPU cores being used by the node
  • MEMORY(bytes): The amount of memory being used by the node
  • MEMORY%: The percentage of memory being used by the node


For those new to Kubernetes, it's uncommon to encounter millicores, which are one-thousandth of a core. To illustrate, a node with 4 cores equals 1000 millicores (4 cores x 1000 millicores per core). As a result, if a node uses 37m of CPU, it implies it's utilizing 37/1000 = 0.037 cores or roughly 3.7% of one core.

Summary

When it comes to implementing the Metrics Server in Linode Kubernetes Engine (LKE), there are some obstacles to overcome. Outdated Helm charts, which is not just related to LKE, and TLS can make the process challenging. However, the Kubernetes Special Interest Group (SIG) provides a chart that is maintained by a community of contributors which I strongly recommend.

To install the Metrics Server, you have to disable TLS verification between the Metrics Server and the kubelet API by setting the --kubelet-insecure-tls flag. This is necessary in Linode because there is no signed certificate available when accessing the node via internal IP. We hope this will change in the future.

About the Author

Aleksandro Matejic, a Cloud Architect, began working in the IT industry over 21 years ago as a technical specialist, right after his studies. Since then, he has worked in various companies and industries in various system engineer and IT architect roles. He currently works on designing Cloud solutions, Kubernetes, and other DevOps technologies.

In his spare time, Aleksandro works on different development projects such as developing devoriales.com, a blog and learning platform launching in 2022/2023. In addition, he likes to read and write technical articles about software development and DevOps methods and tools.

You can contact Aleksandro by visiting his LinkedIn Profile

Comments