Month: February 2020

A Journey of Troubleshooting K8s

We have a k8s cluster setup but noticed that a few of our pods were in Error, Terminating or ContainerCreating states.

How do we figure out what caused these error and how to we correct the errors to make sure our status is Running.

We are running this on k8s version 1.17

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-13T18:07:54Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T20:56:50Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

Current Status

Let's see what our current status is...

$ kubectl get pods -A
NAMESPACE              NAME                                                READY   STATUS              RESTARTS   AGE
default                hello-world-77b74d7cc8-f6p69                        0/1     Error               0          39d
default                hello-world-77b74d7cc8-fr6xp                        0/1     Error               0          39d
kube-system            contrail-agent-7cl7b                                0/2     Terminating         14         28d
kube-system            contrail-kube-manager-btmt9                         0/1     Terminating         10         28d
kube-system            coredns-6955765f44-2d26k                            0/1     Error               5          39d
kube-system            coredns-6955765f44-6q7c7                            0/1     Error               3          39d
kube-system            etcd-kubernetes-cluster-master                      1/1     Running             20         39d
kube-system            kube-apiserver-kubernetes-cluster-master            1/1     Running             24         39d
kube-system            kube-controller-manager-kubernetes-cluster-master   1/1     Running             18         39d
kube-system            kube-flannel-ds-amd64-29wf6                         1/1     Running             4          39d
kube-system            kube-flannel-ds-amd64-6845b                         1/1     Running             2          39d
kube-system            kube-flannel-ds-amd64-v6wpq                         1/1     Running             1          39d
kube-system            kube-proxy-cmgxr                                    1/1     Running             1          39d
kube-system            kube-proxy-qrnlg                                    1/1     Running             1          39d
kube-system            kube-proxy-zp2t2                                    1/1     Running             4          39d
kube-system            kube-scheduler-kubernetes-cluster-master            1/1     Running             20         39d
kube-system            metrics-server-694db48df9-46cgs                     0/1     ContainerCreating   0          18d
kubernetes-dashboard   dashboard-metrics-scraper-76585494d8-th7f6          0/1     Error               0          39d
kubernetes-dashboard   kubernetes-dashboard-5996555fd8-n9v2q               0/1     Error               28         39d
olm                    catalog-operator-64b6b59c4f-7qkpj                   0/1     ContainerCreating   0          41m
olm                    olm-operator-844fb69f58-hfdjk                       0/1     ContainerCreating   0          41m

Determine the Reason for Pod Failure

So I did a search to see how to find the reason for pod failure and the kubernetes docs provided us with the answer.

Get info about a pod (remember to set the namespace):

$ kubectl get pod contrail-agent-7cl7b -n kube-system
NAME                   READY   STATUS        RESTARTS   AGE
contrail-agent-7cl7b   0/2     Terminating   14         28d

Get more detailed info:

kubectl get pod contrail-agent-7cl7b -n kube-system --output=yaml
...

and check the status:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-01-28T15:01:22Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-01-28T15:50:13Z"
    message: 'containers with unready status: [contrail-vrouter-agent contrail-agent-nodemgr]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-01-28T15:50:13Z"
    message: 'containers with unready status: [contrail-vrouter-agent contrail-agent-nodemgr]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-01-28T14:46:11Z"
    status: "True"
    type: PodScheduled

Try check the lastState and see if there is an eror message also check the exit code of the container:

  containerStatuses:
  - containerID: docker://10d2e3518108cd47f36d812e2c33fa62d89743b4482dae77fd2e87fc09536140
    image: docker.io/opencontrailnightly/contrail-nodemgr:latest
    imageID: docker-pullable://docker.io/opencontrailnightly/contrail-nodemgr@sha256:3a73ee7cc262fe0f24996b7f910f7c135a143f3a94874bf9ce8c125ae26368d3
    lastState: {}
    name: contrail-agent-nodemgr
    ready: false
    restartCount: 1
    started: false
    state:
      terminated:
        exitCode: 0
        finishedAt: null
        startedAt: null

Unfortunately not much for us here.

So now try getting the logs for the container in the pod...with:

$ kubectl logs contrail-agent-7cl7b contrail-vrouter-agent -n kube-system
Error from server (BadRequest): container "contrail-vrouter-agent" in pod "contrail-agent-7cl7b" is terminated

Looks like you cannot get the logs from a terminated pod.

This is the reason why an external log agregator is recommended - like prometheus.

How to connect to your remote kuberenetes cluster with kubectl from you local?

You've just set up your kubernetes cluster. Excellent, now you want to start deploying your specs...but they are on a repo on your local machine.

All good let's setup your kubeconfig file so you can connect to your k8s api with kubectl.

  1. Log into your server

  2. Create a service account spec:

kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kube-system
  1. Create the account

    kubectl create -f server-account.yaml

  2. Create the cluster role binding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kube-system
  1. Apply the role binding

kubectl apply -f admin-role-binding.yml

  1. Find the secrets used by the service account
kubectl describe serviceAccounts admin-user
Name:                devacc
Namespace:           default
Labels:              <none>
Annotations:         <none>
Image pull secrets:  <none>
Mountable secrets:   devacc-token-47p8n
Tokens:              devacc-token-47p8n
Events:              <none>
  1. Fetch the token

kubectl describe secrets devacc-token-47p8n

Keep the token

  1. Get the certificate info for the cluseter
kubectl config view --flatten --minify > cluster-cert.txt
cat cluster-cert.txt

Copy certificate-authority-data and server from the output.

  1. Now you can create your kubeconfig file

Create a file called my-service-account-config.yaml and substitute the values for token, certificate-authority-data and server

apiVersion: v1
kind: Config
users:
- name: admin-user
  user:
    token: <replace this with token info>
clusters:
- cluster:
    certificate-authority-data: <replace this with certificate-authority-data info>
    server: <replace this with server info>
  name: self-hosted-cluster
contexts:
- context:
    cluster: self-hosted-cluster
    user: devacc
  name: devacc-context
current-context: devavv-context
  1. Copy the file to $HOME/.kube

  2. Tell kubectl to use that context:

kubectl config --kubeconfig=$HOME/.kube/my-service-account-config.yaml set-context svcs-acct-context

It is better to append it to the base config

Source