# Kubernetes Access Troubleshooting

This page describes common issues with Kubernetes and how to resolve them.

## An agent failed to join a cluster due to "no authorities for hostname"

### Symptoms

The agent can't rejoin the Teleport cluster after restart and reports an error similar to:

```
ssh: handshake failed: ssh: no authorities for hostname

```

### Explanation

Teleport uses certificate authorities (CAs) to sign certificates for each component. When the component joins the cluster for the first time, it receives a certificate signed by the cluster's CA and stores it in its state directory. When the component restarts, it uses the certificate stored in its state directory to join the cluster again.

This error occurs when the component tries to rejoin the Teleport cluster, but the cluster's CA has changed and the component's certificate is no longer valid.

It can happen when the cluster's CA is rotated or when the cluster is recreated or renamed.

### Resolution

The agent's state needs to be reset, so it can request a new certificate and join the cluster again.

The process for deleting the agent's state depends on whether the agent is running inside or outside of Kubernetes.

#### Agents running outside of Kubernetes (standalone)

If the agent is running outside of Kubernetes, the state directory is located at `/var/lib/teleport/proc` by default. You can delete the state directory with the following command:

```
sudo rm -rf /var/lib/teleport/proc
```

And then restart the agent:

```
sudo systemctl restart teleport
```

#### Agents running in Kubernetes (`teleport-kube-agent`)

Starting in Teleport 11, the `teleport-kube-agent` pod's state is stored in a Kubernetes Secret - name:`{pod-name}-state` - existing in the installation namespace. To delete the state, follow the steps below:

```
Get the secrets for the teleport-kube-agent pods
$ kubectl get secret -o name -n teleport-agent | grep "state"
teleport-agent-0-state
teleport-agent-1-state

Delete the secrets
$ kubectl delete secret -n teleport-agent teleport-agent-0-state teleport-agent-1-state
```

If you're mounting `/var/lib/teleport` into the container, please clean the contents of `/var/lib/teleport/proc` inside the container and then restart the container.

Once the state is deleted, restart each agent pod.

## Unable to connect to GKE Autopilot clusters

```
GKE Warden authz [denied by user-impersonation-limitation]: impersonating system identities are not allowed

```

### Symptoms

After configuring a GKE Autopilot cluster in Teleport, all attempts to retrieve Kubernetes Objects fail with an error similar to:

Or the following:

```
GKE Autopilot denies requests that impersonate "system:masters" group

```

---

NOTE

This issue only affects GKE Autopilot clusters. If you're using a standard GKE cluster, this issue doesn't apply to you.

---

### Explanation

Unlike the standard Kubernetes cluster, Autopilot clusters forbid requests to impersonate system identities. This is a security feature that prevents users from gaining access to the cluster's control plane and performing administrative actions if they can impersonate users or groups. [Learn more about GKE Autopilot security](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-security#built-in-security).

Teleport uses impersonation to retrieve Kubernetes objects on behalf of the user. This is done by sending a request to the Kubernetes API server with the user's identity in the `Impersonate-User` header and all the Kubernetes Groups they can use in the `Impersonate-Group` header.

Since `system:masters` is a built-in Kubernetes group in all clusters, it's usual for administrators to use it to gain administrative access to the cluster's control plane. However, Autopilot clusters forbid impersonating this group and require that another group is used instead.

### Resolution

Per the description above, the solution is to configure a different group for impersonation in Teleport. This can be done by setting the role's `kubernetes_groups` parameter to a Group that the Autopilot cluster allows to impersonate.

The `teleport-kube-agent` chart configures a Kubernetes Group with the same access levels as the `system:masters` group when it detects the target cluster is a GKE cluster. This group is named, by default, `cluster-admin`, but the name can be changed by setting the `adminClusterRoleBinding.name` parameter.

The Kubernetes Group isn't created automatically when installing the chart on non-GKE clusters, so don't change the `kubernetes_groups` parameter to `cluster-admin` unless you created the group manually or installed the chart with `adminClusterRoleBinding.create` parameter set to `true`.

It's important to note that the group must be configured in the cluster before it can be used for impersonation. If the group is not configured, the impersonation request will fail with a `403 Forbidden` error and the user will not be able to access the cluster.

When you opt to continue using the `system:masters` group for impersonation in non-Autopilot clusters, you must ensure that the Teleport roles that grant `system:masters` access can't be used to access GKE Autopilot clusters.

As an example, a user with the following role can impersonate the `system:masters` group in any Kubernetes cluster:

```
kind: role
version: v7
metadata:
  name: k8s-admin
spec:
  allow:
    kubernetes_labels:
      '*': '*'
    kubernetes_groups: ["system:masters"]

```

The wildcard `*` in `kubernetes_labels` allows the user to access any Kubernetes cluster in the Teleport cluster. To prevent the user from accessing GKE Autopilot clusters with that role, you can install the `teleport-kube-agent` chart with a label that identifies the cluster as an Autopilot cluster. For example:

```
$ PROXY_ADDR=teleport.example.com:443
$ CLUSTER=cookie
Create the values.yaml file
$ cat > values.yaml << EOF
authToken: "${TOKEN}"
proxyAddr: "${PROXY_ADDR}"
roles: "kube,app,discovery"
joinParams:
  method: "token"
  tokenName: "${TOKEN}"
kubeClusterName: "${CLUSTER}"
tags:
  "type" : "autopilot"
EOF
Install the helm chart with the values.yaml setting
$ helm install teleport-agent teleport/teleport-kube-agent \
  -f values.yaml \
  --create-namespace \
  --namespace=teleport-agent \
  --version 19.0.0-dev
```

Make sure that the Teleport Agent pod is running. You should see one Teleport agent pod with a single ready container:

```
$ kubectl -n teleport-agent get pods
NAME               READY   STATUS    RESTARTS   AGE
teleport-agent-0   1/1     Running   0          32s
```

Now that the cluster is labeled, you can split the `k8s-admin` role into two roles: one that allows access to all non-Autopilot clusters and another that only allows access to Autopilot clusters.

```
kind: role
version: v7
metadata:
  name: k8s-admin-non-gke-autopilot
spec:
  allow:
    kubernetes_labels_expression: 'labels["type"] != "autopilot"'
    kubernetes_groups: ["system:masters"]
---
kind: role
version: v7
metadata:
  name: k8s-admin-gke-autopilot
spec:
  allow:
    kubernetes_labels_expression: 'labels["type"] == "autopilot"'
    kubernetes_groups: ["cluster-admin"]

```

Once the roles are created, you can assign them to users as usual, but to be effective immediately, they must logout and login again.

## Unable to exec into a Pod with kubectl 1.30+

```
pods "<pod_name>" is forbidden: User "<user>" cannot get resource "pods/exec" in API group "" in the namespace "<namespace>"

```

### Symptoms

After upgrading `kubectl` to version 1.30 or later, attempts to exec into a Pod fail with an error similar to:

```
pods "<pod_name>" is forbidden: User "<user>" cannot get resource "pods/exec" in API group "" in the namespace "<namespace>"

```

### Explanation

Starting with Kubernetes 1.30, the `kubectl exec` command switched from using the SPDY protocol to the WebSocket protocol for communicating with the Kubernetes API server. This change was made to enhance the performance and reliability of the `kubectl exec` command, as the SPDY protocol was deprecated.

Although WebSocket was intended to be a drop-in replacement for SPDY, it introduced a breaking change for Kubernetes clusters where RBAC was configured to restrict access to the `pods/exec` resource using only the `create` verb. Previously, the SPDY protocol allowed creating a connection using either `GET` (mapped to `get` in Kubernetes RBAC) or `POST` (mapped to `create` in Kubernetes RBAC).

With the WebSocket protocol, the `kubectl exec` command always uses the `GET` method to create a connection. This means that if the RBAC policy permits only the `create` verb, the connection will be denied.

### Resolution

To resolve this issue, update the RBAC policy to allow the `get` verb for the `pods/exec` (sub)resource. This can be done by modifying the `ClusterRole` or `Role` that grants access to the user.

For example, if you have a `ClusterRole` that grants access to the `pods/exec` resource with the `create` verb, update it to allow the `get` verb as well:

```
- apiGroups: [""]
  resources: ["pods/exec"]
  verbs: ["create", "get"]

```

Once the `ClusterRole` is updated, you should be able to exec into Pods using `kubectl` version 1.30 or later.

## Health Check: Unable to contact the Kubernetes cluster

### Symptoms

A Kubernetes cluster is reported as `unhealthy`, and all calls between Teleport and the Kubernetes cluster API fail.

### Explanation

A Teleport Kubernetes Service performed multiple health checks on a Kubernetes cluster and did not receive a response.

Calls to Kubernetes cluster API endpoints `/selfsubjectaccessreviews` and `/readyz` returned errors without HTTP status codes.

### Resolution

Examine the Kubernetes cluster and network for root causes.

Check the state of the Kubernetes cluster.

```
kubectl cluster-info
kubectl get --raw /readyz

```

Check network and firewall rules.

- Is the network configured properly?
- Check security groups
- Check network policies

## Health Check: Missing required Kubernetes permissions

### Symptoms

A Kubernetes cluster is reported as `unhealthy`, and is missing one or more Kubernetes permissions such as `impersonate users`, `impersonate groups`, `impersonate service accounts` and/or `get pods`.

### Explanation

A Teleport Kubernetes Service performed a health check on a Kubernetes cluster, and received a response saying required Kubernetes cluster capabilities are denied.

The Kubernetes cluster is missing a `ClusterRole`, or is configured with a partial `ClusterRole`. Teleport requires four Kubernetes [SelfSubjectAccessReview](https://kubernetes.io/docs/reference/kubernetes-api/authorization-resources/self-subject-access-review-v1/) permissions:

- Impersonate users
- Impersonate groups
- Impersonate service accounts
- Get pods

### Resolution

Add a `ClusterRole` and `ClusterRoleBinding` to the Kubernetes cluster with the four permissions.

```
$ kubectl config use-context <your-kube-context>
kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: teleport-impersonation
rules:
  - apiGroups:
      - ""
    resources:
      - users
      - groups
      - serviceaccounts
    verbs:
      - impersonate
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
  - apiGroups:
      - "authorization.k8s.io"
    resources:
      - selfsubjectaccessreviews
    verbs:
      - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: teleport
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: teleport-impersonation
subjects:
  - kind: ServiceAccount
    name: teleport-serviceaccount
    namespace: default
EOF

```

Your `ClusterRoleBinding` might vary.

Also see [enabling impersonation](https://goteleport.com/docs/enroll-resources/kubernetes-access/controls.md) in Teleport Kubernetes.

## Health Check: Unhealthy Kubernetes cluster detected with status code

### Symptoms

A Kubernetes cluster is reported as `unhealthy`, and Teleport received a response from the Kubernetes cluster with an error status code.

### Explanation

A Teleport Kubernetes Service performed multiple health checks on a Kubernetes cluster and received responses.

A call to the permission endpoint `/selfsubjectaccessreviews` failed, and a call to the Kubernetes cluster `/readyz` endpoint returned a response with an error status code. The error status code indicates that a component in the Kubernetes cluster is in an unhealthy state.

### Resolution

Diagnose the Kubernetes cluster error status code with `kubectl`.

```
kubectl cluster-info
kubectl get --raw /readyz

```

## Health Check: Unable to check Kubernetes permissions

### Symptoms

A Kubernetes cluster is reported as `unhealthy`, and Teleport received mixed responses from health check endpoints.

### Explanation

A Teleport Kubernetes Service performed multiple health checks on a Kubernetes cluster and received responses.

A call to the permission endpoint `/selfsubjectaccessreviews` failed, and a call to the Kubernetes cluster `/readyz` succeeded. Kubernetes cluster components are operational and something is blocking calls to the Kubernetes [SelfSubjectAccessReview](https://kubernetes.io/docs/reference/kubernetes-api/authorization-resources/self-subject-access-review-v1/) API.

### Resolution

Diagnose the Kubernetes cluster with `kubectl`.

```
kubectl cluster-info
kubectl get --raw /readyz

```
