Category: Kubernetes

A Journey of Troubleshooting K8s

We have a k8s cluster setup but noticed that a few of our pods were in Error, Terminating or ContainerCreating states.

How do we figure out what caused these error and how to we correct the errors to make sure our status is Running.

We are running this on k8s version 1.17

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-13T18:07:54Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T20:56:50Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

Current Status

Let's see what our current status is...

$ kubectl get pods -A
NAMESPACE              NAME                                                READY   STATUS              RESTARTS   AGE
default                hello-world-77b74d7cc8-f6p69                        0/1     Error               0          39d
default                hello-world-77b74d7cc8-fr6xp                        0/1     Error               0          39d
kube-system            contrail-agent-7cl7b                                0/2     Terminating         14         28d
kube-system            contrail-kube-manager-btmt9                         0/1     Terminating         10         28d
kube-system            coredns-6955765f44-2d26k                            0/1     Error               5          39d
kube-system            coredns-6955765f44-6q7c7                            0/1     Error               3          39d
kube-system            etcd-kubernetes-cluster-master                      1/1     Running             20         39d
kube-system            kube-apiserver-kubernetes-cluster-master            1/1     Running             24         39d
kube-system            kube-controller-manager-kubernetes-cluster-master   1/1     Running             18         39d
kube-system            kube-flannel-ds-amd64-29wf6                         1/1     Running             4          39d
kube-system            kube-flannel-ds-amd64-6845b                         1/1     Running             2          39d
kube-system            kube-flannel-ds-amd64-v6wpq                         1/1     Running             1          39d
kube-system            kube-proxy-cmgxr                                    1/1     Running             1          39d
kube-system            kube-proxy-qrnlg                                    1/1     Running             1          39d
kube-system            kube-proxy-zp2t2                                    1/1     Running             4          39d
kube-system            kube-scheduler-kubernetes-cluster-master            1/1     Running             20         39d
kube-system            metrics-server-694db48df9-46cgs                     0/1     ContainerCreating   0          18d
kubernetes-dashboard   dashboard-metrics-scraper-76585494d8-th7f6          0/1     Error               0          39d
kubernetes-dashboard   kubernetes-dashboard-5996555fd8-n9v2q               0/1     Error               28         39d
olm                    catalog-operator-64b6b59c4f-7qkpj                   0/1     ContainerCreating   0          41m
olm                    olm-operator-844fb69f58-hfdjk                       0/1     ContainerCreating   0          41m

Determine the Reason for Pod Failure

So I did a search to see how to find the reason for pod failure and the kubernetes docs provided us with the answer.

Get info about a pod (remember to set the namespace):

$ kubectl get pod contrail-agent-7cl7b -n kube-system
NAME                   READY   STATUS        RESTARTS   AGE
contrail-agent-7cl7b   0/2     Terminating   14         28d

Get more detailed info:

kubectl get pod contrail-agent-7cl7b -n kube-system --output=yaml

and check the status:

  - lastProbeTime: null
    lastTransitionTime: "2020-01-28T15:01:22Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-01-28T15:50:13Z"
    message: 'containers with unready status: [contrail-vrouter-agent contrail-agent-nodemgr]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-01-28T15:50:13Z"
    message: 'containers with unready status: [contrail-vrouter-agent contrail-agent-nodemgr]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-01-28T14:46:11Z"
    status: "True"
    type: PodScheduled

Try check the lastState and see if there is an eror message also check the exit code of the container:

  - containerID: docker://10d2e3518108cd47f36d812e2c33fa62d89743b4482dae77fd2e87fc09536140
    imageID: docker-pullable://
    lastState: {}
    name: contrail-agent-nodemgr
    ready: false
    restartCount: 1
    started: false
        exitCode: 0
        finishedAt: null
        startedAt: null

Unfortunately not much for us here.

So now try getting the logs for the container in the pod...with:

$ kubectl logs contrail-agent-7cl7b contrail-vrouter-agent -n kube-system
Error from server (BadRequest): container "contrail-vrouter-agent" in pod "contrail-agent-7cl7b" is terminated

Looks like you cannot get the logs from a terminated pod.

This is the reason why an external log agregator is recommended - like prometheus.

How to connect to your remote kuberenetes cluster with kubectl from you local?

You've just set up your kubernetes cluster. Excellent, now you want to start deploying your specs...but they are on a repo on your local machine.

All good let's setup your kubeconfig file so you can connect to your k8s api with kubectl.

  1. Log into your server

  2. Create a service account spec:

kind: ServiceAccount
  name: admin-user
  namespace: kube-system
  1. Create the account

    kubectl create -f server-account.yaml

  2. Create the cluster role binding:

kind: ClusterRoleBinding
  name: admin-user
  kind: ClusterRole
  name: cluster-admin
- kind: ServiceAccount
  name: admin-user
  namespace: kube-system
  1. Apply the role binding

kubectl apply -f admin-role-binding.yml

  1. Find the secrets used by the service account
kubectl describe serviceAccounts admin-user
Name:                devacc
Namespace:           default
Labels:              <none>
Annotations:         <none>
Image pull secrets:  <none>
Mountable secrets:   devacc-token-47p8n
Tokens:              devacc-token-47p8n
Events:              <none>
  1. Fetch the token

kubectl describe secrets devacc-token-47p8n

Keep the token

  1. Get the certificate info for the cluseter
kubectl config view --flatten --minify > cluster-cert.txt
cat cluster-cert.txt

Copy certificate-authority-data and server from the output.

  1. Now you can create your kubeconfig file

Create a file called my-service-account-config.yaml and substitute the values for token, certificate-authority-data and server

apiVersion: v1
kind: Config
- name: admin-user
    token: <replace this with token info>
- cluster:
    certificate-authority-data: <replace this with certificate-authority-data info>
    server: <replace this with server info>
  name: self-hosted-cluster
- context:
    cluster: self-hosted-cluster
    user: devacc
  name: devacc-context
current-context: devavv-context
  1. Copy the file to $HOME/.kube

  2. Tell kubectl to use that context:

kubectl config --kubeconfig=$HOME/.kube/my-service-account-config.yaml set-context svcs-acct-context

It is better to append it to the base config


Kubernetes Questions – Please answer them

What is the Difference between a Persistent Volume and a Storage Class?

What happens when pods are killed, is the data persisted - How do you test this?

What is the difference between a Service and an Ingress?

By default, Docker uses host-private networking, so containers can talk to other containers only if they are on the same machine.

If you check the pod ip and there is an open containerPort then you should be able to access it via the node - with curl.

What happens when a node dies? The pods die with it, and the Deployment will create new ones, with different IPs. This is the problem a Service solves.

A Kubernetes Service is an abstraction which defines a logical set of Pods running somewhere in your cluster, that all provide the same functionality

When created, each Service is assigned a unique IP address (also called clusterIP)

This address is tied to the lifespan of the Service, and will not change while the Service is alive

communication to the Service will be automatically load-balanced

  • targetPort: is the port the container accepts traffic on
  • port: is the abstracted Service port, which can be any port other pods use to access the Service

Note that the Service IP is completely virtual, it never hits the wire

Kubernetes supports 2 primary modes of finding a Service - environment variables and DNS - DNS requires a COreDNS addon

Ingress is...

An API object that manages external access to the services in a cluster, typically HTTP

How do you know the size of the PV's to create for the PVC's of a helm chart?

Are helm chart declarative or imperitive?

What is a kubernetes operator?

How do you start a new mysql docker container with an existing data directory?

Usage against an existing databaseIf you start your mysql container instance with a data directory that already contains a database (specifically, a mysql subdirectory), the $MYSQL_ROOT_PASSWORD variable should be omitted from the run command line; it will in any case be ignored, and the pre-existing database will not be changed in any way.

The above did not work for me.

trademate-db_1   | Initializing database
trademate-db_1   | 2020-01-16T05:59:38.689547Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
trademate-db_1   | 2020-01-16T05:59:38.690778Z 0 [ERROR] --initialize specified but the data directory has files in it. Aborting.
trademate-db_1   | 2020-01-16T05:59:38.690818Z 0 [ERROR] Aborting
trademate-db_1   | 
trademate_trademate-db_1 exited with code 1

How do you view the contents of a docker volume?

You can't do this without a container for named volumes (the one's docker manages). So kak...

How do you run python debugger and attach inside a docker container?

Its a mess up...just develop locally

If you were mounting a conf file into an nginx image from docker-compose - how do you do that in production? Do you bake it into the image?

Yes you should.

Do something like this:

FROM nginx:1.17

COPY ./config/nginx/conf.d /etc/nginx/conf.d

# Remove default config
RUN rm /etc/nginx/conf.d/default.conf 

How do you deploy all the k8s spec files in a folder at once? If not is there a specific order to deploy them in?

This Service should exists before the replicas - as it adds environment variables to containers on the pods in the replicaset based on services created.

Should gunicorn and nginx containers be in the same pod?