I finally decided to bring some infrastructure as code love to my Proxmox home lab. I have used both Ansible and Terraform in a couple of projects at work so it was an easy decision. Of course the learning never stops and since I don’t have a programming background some concepts are harder to chew than others, but the overwhelming number of great blogs and projects make it quite easy to pick up the skills and start with IaC.

You can jump to Project Links to access the code repositories on GitLab.

Let’s start with Packer

I used Packer briefly at the start of this automation project, but it was eventually replaced by cloud-init, simply because it’s so much easier to get that initial configuration, like hostname, user, ssh key, etc done with cloud-init images.

Nevertheless it is a powerful automation tool and here’s a video using Packer to create an Ubuntu linux template on Proxmox.


Terraform

Terraform does not have an official provider for Proxmox but luckily there’s a community provider & plugin by Telmate.

I started with the sample configuration and modified it to suit my requirements, mainly adding count and iteration to create multiple VMs. Cloud-init seemed like a good option, so I created a template using official documentation from Proxmox and added user, password, DNS domain, DNS servers, SSH public key and IP config values.

With configuration and template ready, I typed away terraform commands at the console and created a bunch of VMs. At first boot cloud-init automatically set hostnames to VM names and I logged in to a few VMs via console to confirm username and password were set correctly. So far everything was great … well almost!

I soon found out that the provisioned VMs would, at first boot, pass the generic ubuntu hostname during DHCP lease and before cloud-init provisioning kicked in. Basically DCHP registers and updates the IP addresses of each provisioned VM, in order of boot, against ubunt.itnoobs.local entry in DNS. If I provisioned 10 VMs then I’d get 10 VMs sitting there with correct and unique hostnames, however DNS wouldn’t know about any of them until a reboot or manual intervention! You can see this behaviour in the video below:


Unfortunately Terraform was never built to allow for user-specified reboot or shutdown, but it does have both local and remote provisioners so I could use an Ansible playbook to do some extra work on the VMs.

Ansible

Since I had to use Ansible to reboot the newly provisioned VMs, there was an opportunity to do a bit of configuration too.

I created a couple of simple Ansible playbooks; first one to remove cloud-init drive and reboot the VM, and second one to install qemu-guest-agent, add CPU hot plug configuration and reboot the VM again.

Ansible provides a community module for Proxmox but I didn’t need that for couple of simple tasks. SSH connection to the Proxmox host and utilizing Qemu commands was sufficient:

phase1.yaml

  • Connect to host and elevate access (become)
  • Get the VMID from VM name
  • Use qm commands to stop VM, remove cloud-init drive and start VM

And to install guest agent and create hot plug configuration:

phase2.yaml

  • Connect to the provisioned VM via SSH
  • Elevate access - cloud-init images have NOPASSWD option configured out of the box
  • Update APT cache
  • Install qemu-guest-agent
  • Check if CPU hot plug config already exists, and if not:
    • Create the configuration file and its content:

      echo 'SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}="1"' | sudo tee /lib/udev/rules.d/80-hotplug-cpu.rules

    • Reboot

After a few test runs, phase2 playbook would intermittently fail because of corrupted files in /var/lib/dpkg/updates and suggest running dpkg --configure -a to fix it. I assume this is either caused by Terraform provisoning at some stage or the unclean shutdown of VM with qm commands.

Anyhow it had to be fixed, so a few more tasks were added. I used commands instead of Ansible module for APT because of inconsistent behaviour and intermittent failures in my tests.

Secrets & Logins

I think it’s very important to understand and implement a secure storage for secrets and it is generally easy to do so. And to promote the idea, this project will rely on a cloud based Key Vault by default.

I have a Visual Studio subscription from work that gets me $50 AUD Azure credit per month, so I tend to use solutions in Azure but if that wasn’t the case, I’d consider HashiCorp’s Vault as a free on-prem alternative.

To allow Terraform access to Azure resources, you need to login via Azure CLI. If you’re using a CI/CD tool (OctopusDeploy or GitLab for example) it will most likely have a secret management feature or handle cloud connections in a secure manner.

SSH connections for Ansible use the configured key pair authentication. Proxmox root password is the only secret passed to ansible-playbook command from Terraform that is unfortunately displayed in plain text on console only but I accepted to live with that, for now.

Remote State File

Terraform uses a local state file by default. This file is very important and keeps all the up to date information about your infrastructue. When a resource is created or destroyed this file is updated to keep track. Remote state file is quite useful and a must-have for a shared or team project. You can read more about it HERE.

Even though I am not sharing this project/code with a team, I use remote state file, because it allows me to run my project from anywhere without the need to keep a local copy. You should never share or source control Terraform state file because it contains sensitive information about your environment, especially for resources in public cloud.

All the code/configuration shown in this post is available on Gitlab.


Quick demo creating and destroying VMs with Terraform!

Zero to five in 5 minutes!


Five to zero in 12 seconds!