Managing LXC/LXD Linux Containers with Terraform

Disclaimer: This post will not speak about creating and maintaining LXC clusters and networking.

This post will introduce LXC/LXD and Terraform seperately, as they are independent things. Once you have enough information about what these things are, I will show you how to use them together.

LXC

The lightervisor

LXC is linux containers. A System container manager, offering a similar user experience to virtual machines but using linux containers instead.

LXD and LXC, what the heck?

LXC – Linux Containers, a userspace interface using the host’s kernel – the underlying technology
LXD – An extension to LXC adding a Rest API – an alternative to LXC’s tools – the interface to the tech

Containers vs Virtual Machines

Containers leverage the OS and kernel of the host they are on, that is why containers (on linux) can’t run windows. A Virtual machine hypervisor is heavyweight, as they have reserved resources for their own linux kernel and host operating system. Containers should be faster to deploy and you can pack them more tightly (density) onto hardware.

What Makes LXC different from Docker containers?

LXC is designed to run as a full machine OS (clean copy of a Linux distribution or a full appliance) and give the experience of a full machine OS. LXD doesn’t care what is running in the container, docker does. Docker on the other hand is built to run a single application – they are focused on being process based containers.

Testing out LXC Containers

On Ubuntu 18.04 (Bionic Beaver) LXC/LXD is preinstalled. Take note it is a canonical product, red hat seems to have chosen docker and dropped support for some lxc libs.

lxc is the command line tool.

LXC recommends installing zfsutils-linux, so we can use so we can use zfs storage pools.

sudo apt install zfsutils-linux

To initialise you need to add the user to the lxd group.

sudo adduser vagrant lxd

Initialise LXD, you can leave most as default allow remote access so terraform can access it later on:


vagrant@ubuntu-bionic:~$ lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: 
Name of the storage backend to use (btrfs, dir, lvm, zfs) [default=zfs]: 
Create a new ZFS pool? (yes/no) [default=yes]: 
Would you like to use an existing block device? (yes/no) [default=no]: 
Size in GB of the new loop device (1GB minimum) [default=15GB]: 5
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
Would you like LXD to be available over the network? (yes/no) [default=no]: yes
Address to bind LXD to (not including port) [default=all]: 
Port to bind LXD to [default=8443]: 
Trust password for new clients: 
Again: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

View networks, storage:


lxc network list
lxc storage list

Create a container, list all containers and enter the shell of the container (by default it will share the amoutn of memory on the host unless limited):


vagrant@ubuntu-bionic:~$ lxc launch ubuntu:16.04
Creating the container
Container name is: fine-buzzard
Starting fine-buzzard
vagrant@ubuntu-bionic:~$ lxc list
+--------------+---------+----------------------+-----------------------------------------------+------------+-----------+
|     NAME     |  STATE  |         IPV4         |                     IPV6                      |    TYPE    | SNAPSHOTS |
+--------------+---------+----------------------+-----------------------------------------------+------------+-----------+
| fine-buzzard | RUNNING | 10.80.176.232 (eth0) | fd42:f4c6:f849:da44:216:3eff:fec5:2368 (eth0) | PERSISTENT | 0         |
+--------------+---------+----------------------+-----------------------------------------------+------------+-----------+
lxc exec fine-buzzard -- /bin/bash
vagrant@ubuntu-bionic:~$ lxc exec fine-buzzard -- /bin/bash
root@fine-buzzard:~# free -m
              total        used        free      shared  buff/cache   available
Mem:            985          30         894           6          60         954
Swap:             0           0           0

Download a centos image to the local image repo


lxc image copy images:centos/7 local:

What we have done so far is a manual process of managment. A procedure of commands typed with the API via CLI. We aren’t sure of the current state until we check the config. We don’t know the history of how we got to this state.

Let us look at Terraform…

Terraform

Terraform is an infrastructure as code tool. All you need to do to know the current state of you infrastructure is to look at the code. You are not limited to virtualisation tools, the power of terraform comes from the various things you can manage with it, using providers. Check the providers maintained by terraform and the providers available created and maintained by the community.

Provider categories:

Major cloud providers: AWS, Azure, Oracle, GCP, vSphere and vCloud Director
Smaller cloud providers: DigitalOcean, Fastly, Linode, Heroku, Hetzner, OpenStack, OVH
Infrastructure software: Kubernetes, Vault, Rancher, Mailgun, Consul
Network: Cloudflare, DNS, FortiOS, Akamai
Version Control: Github, gitlab and bitcbucket
Monitoring: New relic, Datadog, Grafana
Database: MySQL, Influx, Postgres
Many more…

How is Terraform different from Config Management Tools like Ansible

They are both open source and platform agnostic. They are both agentless, have a decent community. Terraform has not been around as long (2014 vs 2012). Ansible has modules to manage many of the providers mentioned above.

The main difference is that Ansible is procedural – code is step-by-step, how to to achieve some desired end state and Terraform is declarative – code specifies your desired end state.

With Terraform: you are aware of what is already deployed

Eg. Ansible


- ec2:
    count: 10
    image: ami-v1    
    instance_type: t2.micro

vs Terraform


resource "aws_instance" "example" {
  count         = 10
  ami           = "ami-v1"
  instance_type = "t2.micro"
}

Running the above once will produce similar results. The problem is when you want to change something.

Say you wanted to add another 5 servers, with ansible you would need to figure out that 10 were already deployed and then change the count to 5. If you just changed the 10 to 15, you would end up with 25 servers.

Whereas with terraform, it already knows about the 10. So if you change the count to 15, only 5 will be deployed.

With procedural tools, the state of the infrastructure is not fully captured in the code

Ansible allows for mutable infrastructure whereas Terraform forces it to be immutable.

With ansible a history of changes will be built up, ansible will log in via ssh and install something – leading to configuration drift – each server becoming slightly different from another.

With terraform every change to configuration will (usually) return a new object – a new server. You can force terraform to do updates in-place but it is not the natural way. The downside of this is things are deleted – so if you have code or infrastructure with important data on it…it’ll be wiped so need to be very careful.

Terraform Concepts

Terraform has *.tf files that specify the configuration describing your infrastructure. It’s own configuration language, although json can be used.

It is HashiCorp recommended practice that credentials never be hardcoded into *.tf configuration files

Use terraform init to get the correct provider/s.

Then terraform plan to see what will be done on the infrastructure (created, deleted and changed), terraform apply will run plan anyway then ask you to approve.

State is saved in terraform.tfstate or *.tfstate files, so ensure to keep these in version control or use a remote backend.

View current state withterraform show

Remote State Storage

In production environments it is more responsible to share responsibility for infrastructure

The best way to do this is by running Terraform in a remote environment with shared access to state.

Read more about backends in the terraform docs

Terraform with LXD in Practice

Disclaimer: The Terraform-lxd-provider is not maintained by Hashicorp, it is a community provider.

Install Terraform, it is a binary file.

Then install the terraform-lxd-provider from the releases page and then copy that to ~/.terraform.d/plugins on the control host (host running terraform).

Initial setup

You need lxd installed on the managed host, that user needs to be part of the lxd group.

Also install sudo apt install zfsutils-linux

Remember that a default profile resource is installed, called default.

Also ensure that lxd is allowing remote connections with:


    lxc config set core.https_address "[::]"
    lxc config set core.trust_password pass

Provisioning an Infrastructure

On your control host, run terraform init

Lets create our lxd.tf configuration:


# provider to connect to infrastructure
provider "lxd" {
  generate_client_certificates = true
  accept_remote_certificate    = true

  lxd_remote {
    name     = "lxd-server-1"
    scheme   = "https"
    address  = "192.168.33.10"
    password = "pass"
    default  = true
  }
}

# image resources
resource "lxd_cached_image" "ubuntu1804" {
  source_remote = "ubuntu"
  source_image = "18.04"
}

resource "lxd_cached_image" "centos7" {
  source_remote = "images"
  source_image  = "centos/7"
}

# containers
resource "lxd_container" "first" {
  config     = {}
  ephemeral  = false
  limits     = {
      "memory" = "128MB"
      "cpu" = 1
  }
  name       = "first"
  profiles   = [
      "terraform_default",
  ]
  image      = "${lxd_cached_image.ubuntu1804.fingerprint}"
  wait_for_network = false
}

resource "lxd_container" "second" {
  config     = {}
  ephemeral  = false
  limits     = {
      "memory" = "128MB"
      "cpu" = 1
  }
  name       = "second"
  profiles   = [
      "terraform_default",
  ]
  image      = "${lxd_cached_image.centos7.fingerprint}"
  wait_for_network = false
}

# Profile
resource "lxd_profile" "terraform_default" {
    config      = {}
    description = "Default LXD profile created by terrraform"
    name        = "terraform_default"

    device {
        name       = "root"
        properties = {
            "path" = "/"
            "pool" = "${lxd_storage_pool.default.name}"
        }
        type       = "disk"
    }

    device {
        name       = "eth0"
        properties = {
            "nictype" = "bridged"
            "parent" = "${lxd_network.lxdbr0.name}"
        }
        type       = "nic"
    }
}

# Storage Pool
resource "lxd_storage_pool" "default" {
  config = {
      "size"   = "5GB"
      "source" = "/var/lib/lxd/disks/default.img"
      "zfs.pool_name" = "default"
  }
  driver = "zfs"
  name   = "default"
}

# Bridge Network
resource "lxd_network" "lxdbr0" {
  name = "lxdbr0"
  description = "bridge interface for all containers to access hosts internet"

  config = {
    "ipv4.address" = "10.39.58.1/24"
    "ipv4.nat"     = "true"
    "ipv6.address" = "fd42:38a5:c677:b741::1/64"
    "ipv6.nat"     = "true"
  }
}

output "second-ip" {
  value = lxd_container.second.ip_address
}

output "first-ip" {
  value = lxd_container.first.ip_address
}

Do terraform apply, and hopefully if all goes well your infrastructure will be provisioned.

A terraform.tfstate file will now be created.

Deleting a resource with terraform

How would you delete a resource in terraform?

Simply remove the resource from your configuration – along with any other reference.

Then run terraform apply, which will give you this output saying the container will be destroyed:


  # lxd_container.first will be destroyed
  - resource "lxd_container" "first" {
      - config           = {} -> null
      - ephemeral        = false -> null
      - id               = "first" -> null
      - image            = "368bb7174b679ece9bd0dfe2ab953c02c47ff4451736cb255655ba8348f17bc0" -> null
      - ip_address       = "10.39.58.108" -> null
      - limits           = {
          - "cpu"    = "1"
          - "memory" = "128MB"
        } -> null
      - mac_address      = "00:16:3e:eb:d7:3c" -> null
      - name             = "first" -> null
      - privileged       = false -> null
      - profiles         = [
          - "terraform_default",
        ] -> null
      - status           = "Running" -> null
      - wait_for_network = false -> null
    }

The terraform status will have been updated.

Deleting an existing resource outside terraform – within lxc

Lets remove the second container from lxc.


lxc stop second
lxc delete second

Now from terraform run, terraform apply. This will first run a terraform refresh which will update local state file against real resources.
Terraform will now update the local state file that the second container is no longer present.

It will say it will create the resource again:


  # lxd_container.second will be created
  + resource "lxd_container" "second" {
      + ephemeral        = false
      + id               = (known after apply)
      + image            = "368bb7174b679ece9bd0dfe2ab953c02c47ff4451736cb255655ba8348f17bc0"
      + ip_address       = (known after apply)
      + limits           = {
          + "cpu"    = "1"
          + "memory" = "128MB"
        }
      + mac_address      = (known after apply)
      + name             = "second"
      + privileged       = false
      + profiles         = [
          + "terraform_default",
        ]
      + status           = (known after apply)
      + wait_for_network = false
    }

Lets apply this change…

Modifying an existing resource with Terraform

Say we want to change a container from ubuntu to centos, and change another container to limit to 512 mb of RAM. Make the following changes in your configuration file:


resource "lxd_container" "first" {
  config     = {}
  ephemeral  = false
  limits     = {
      "memory" = "128MB"
      "cpu" = 1
  }
  name       = "first"
  profiles   = [
      "terraform_default",
  ]
  image      = "${lxd_cached_image.centos7.fingerprint}"
  wait_for_network = false
}

resource "lxd_container" "second" {
  config     = {}
  ephemeral  = false
  limits     = {
      "memory" = "512MB"
      "cpu" = 1
  }
  name       = "second"
  profiles   = [
      "terraform_default",
  ]
  image      = "${lxd_cached_image.ubuntu1804.fingerprint}"
  wait_for_network = false
}

After a terraform apply, you will see:


An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # lxd_container.first must be replaced
-/+ resource "lxd_container" "first" {
      - config           = {} -> null
        ephemeral        = false
      ~ id               = "first" -> (known after apply)
      ~ image            = "368bb7174b679ece9bd0dfe2ab953c02c47ff4451736cb255655ba8348f17bc0" -> "8ccb053e1872c799616f0eb3534994629192b43e7362df5d0ac5a91772836830" # forces replacement
      ~ ip_address       = "10.39.58.69" -> (known after apply)
        limits           = {
            "cpu"    = "1"
            "memory" = "128MB"
        }
      ~ mac_address      = "00:16:3e:17:03:71" -> (known after apply)
        name             = "first"
        privileged       = false
        profiles         = [
            "terraform_default",
        ]
      ~ status           = "Running" -> (known after apply)
        wait_for_network = false
    }

  # lxd_container.second will be updated in-place
  ~ resource "lxd_container" "second" {
        config           = {}
        ephemeral        = false
        id               = "second"
        image            = "368bb7174b679ece9bd0dfe2ab953c02c47ff4451736cb255655ba8348f17bc0"
        ip_address       = "10.39.58.158"
      ~ limits           = {
            "cpu"    = "1"
          ~ "memory" = "128MB" -> "512MB"
        }
        mac_address      = "00:16:3e:24:cc:62"
        name             = "second"
        privileged       = false
        profiles         = [
            "terraform_default",
        ]
        status           = "Running"
        wait_for_network = false
    }

What is hapening here is that the first container is going to be replaced – deleted and then recreated. Whereas the second container will be updated in place – all things done on the vm/container will be maintained. So be very careful as you certainly don’t want to delete important things…

Modifying an existing resource in LXC

Lets rename a resource in LXC


lxc stop first
lxc rename first my-django-app
lxc start my-django-app

The first container no longer exists, so when running a terraform apply…it will state that it is creating a new container called first. Furthermore, it does not know about the renaming and does not import the renamed container into state or config.

Terraform will therefore not manage that resource. In order to ensure terraform does manage that resource you need to import it. That can be done by modifying the config with the name change and doing a terraform import.


resource "lxd_container" "my-django-app" {
  config     = {}
  ephemeral  = false
  limits     = {
      "memory" = "128MB"
      "cpu" = 1
  }
  name       = "my-django-app"
  profiles   = [
      "terraform_default",
  ]
  image      = "${lxd_cached_image.centos7.fingerprint}"
  wait_for_network = false
}

then do:

terraform import lxd_container.my-django-app my-django-app

The problem is that thisis not done automatically, you have to know about the change made. So if things are managed outside of terraform the state in terraform state files eventually start differing / drifting from config managed directly in the product/service you are managing.

Adding a new resource outside of terraform

What happens when we directly create a new resource in LXC:

lxc launch ubuntu:16.04 -p terraform_default

We are setting it yo use the profile created from terraform as the default profile has no root device.

Terraform will not know about the new image downloaded or the new device.

A problem also is that you cannot import images using this provider – some things just can’t be done but this depends on the provider. This is a community provider.

Error: resource lxd_cached_image doesn’t support import

We can however import the container, provided we know it’s name by creating the resource in config:


resource "lxd_container" "precise-jennet"{

}

then


$ terraform import lxd_container.precise-jennet precise-jennet
lxd_container.precise-jennet: Importing from ID "precise-jennet"...
lxd_container.precise-jennet: Import prepared!
Prepared lxd_container for import
lxd_container.precise-jennet: Refreshing state... [id=precise-jennet]

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

We aren’t done yet, now as we should copy the config of the imported resource from status into config – annoying that terraform does not do this.

You can do a terraform show and then copy the data over into your config, be sure to exlude things that should be there – although this is trail and error…


resource "lxd_container" "precise-jennet" {
    config      = {}
    ephemeral   = false
    id          = "precise-jennet"
    ip_address  = "10.39.58.197"
    limits      = {}
    mac_address = "00:16:3e:bb:09:5e"
    name        = "precise-jennet"
    privileged  = false
    profiles    = [
        "terraform_default",
    ]
    status      = "Running"
}

Extra Sauce

Graphing your infrastructure


brew install graphviz
terraform graph -type=plan | dot -Tpng > plan_graph.png

Pros

You don’t need to know the steps and order in provisioning, just know the resources and configuration. Terraform will do the rest.
Ability to view how infrastrcuture has changed over time
Know the current state or blueprint of your infrastructure with 1 command or 1 file view
No need to read through the platform/product api docs and concern yourself with the edge cases
No need for a cmdb (milage may vary), if the infrastructure is configured as code and checked into a git repo. That is your source of truth.

Cons

Rolling, zero-downtime deployment, are hard to express in purely declarative terms
Difficult creating generic, reusable code.
Terraform is not mature yet. It is 0.12.0, so in active dev and might be better to wait for v1
Importing must be done one at a time – manually. Does not import config into *.tf config failes
Things are deleted and recreated – for things like changing a vm name. Care needs to be taken.
Community providers may not be maintained to the level of those maintained by Hashicorp

Conclusion

Horses for courses.

LXC