Kelvin Tay

terraform

As a staff-level support engineer, one of my responsibilities is to empower my teammates in better reproduction of customer environments.

CircleCI does offer an on-prem solution, CircleCI Server, that comes as a Helm chart.

Beyond a Kubernetes cluster (e.g., AWS EKS), you would also need to provision external object stores (e.g., AWS S3), IAM entities (e.g., AWS IAM user/roles) and etc.

My original goal was to try to provision everything within 1 Terraform module. By everything, we are referring to the EKS cluster, the S3 bucket, the Helm release and etc.

However, as I designed along, I then realized this was not ideal in many ways:

  1. Helm releases are re: application deployments while Terraform applies are re: infrastructure deployments. Trying to piggyback a [Helm release]() as part of infrastructure changes did feel odd for me. (This article, specifically anti-pattern 4, describes this conflict better than I can.)
  2. Terraform's philosophy requires all resources managed in Terraform to be strictly managed within Terraform. As an administrator, this means any updates I want to make to my Helm release has to again be done within the Terraform module. The documented notes on upgrades suggests that any drift detected can produced unintended changes if the administrator is not careful.
  3. I still want to define my EKS cluster with eksctl and YAML; There is a eksctl provider indeed but it still requires eksctl explicitly on the host machine nonetheless.

The various discussions on Reddit (1, 2) also convinced me it was better to avoid shoehorning all the setup into 1 Terraform module.

I ended up splitting up the set up such that:

  • All non-EKS related resources (e.g., S3 bucket, IAM users) are managed via Terraform
  • The administrator creates the EKS cluster via eksctl, but the YAML file is first generated by Terraform's local_file
  • Similarly, the administrator manages the Helm release via helm commands, but the YAML files are generated by Terraform's local_sensitive_file
  • The full commands to run for eksctl and helm are shown via Terraform outputs.
  • Each installation phase is its own Terraform module, and we read a previous module's outputs as inputs (e.g., AWS tags) via remote_state data source.

This meant that the administrator has to manage about 4 or more Terraform modules (instead of 1). However, I feel this is easier to manage and reason about.

#terraform #helm #kubernetes #iac

buy Kelvin a cup of coffee

I have released support for the CircleCI Runner resource-class and token in my unofficial Terraform provider for CircleCI, as per v0.10.3.

Developers can now manage the provisioning (and teardown) of CircleCI self-hosted Runners within Terraform. You can explore an example here.

This was a fun challenge, and I wanted to document my journey on this work.

Investigation

Unlike other resources, self-hosted runners are not manageable under the official V2 API. Developers had to use the CircleCI CLI to manage resource-classes and tokens instead.

To port this into my Terraform provider, I was hoping there was a HTTP API available. This way, I can continue using my approach of abstracting the HTTP API away to a Go SDK.

I assumed (wrongly) the CLI was using GraphQL under the hood for Runner operations, as with many others (e.g., for Orbs).

Digging into the source-code, I then realized Runner resource-classes and tokens can be managed via a HTTP API; It was simply not publicly documented, yet.

Glue

After the legwork mentioned above, I tested the HTTP APIs with an OpenAPI (Swagger 2.0) document. This enabled me to generate a Go SDK for Runner APIs.

Why are the Go SDKs separated? Wouldn't it be easier to keep it all in one?

That was something I mulled over for some time indeed. I have documented my reasons for keeping them separate.

Assembly

With the Go SDK published, I “simply” have to then expose the Runner resource-classes and tokens in the Terraform provider codebase.

The main work was done within a pull request here. This also included acceptance tests, and examples.

Ultimately, I noted self-hosted (machine) runners are also available for CircleCI's Server customers (i.e., self-hosted CircleCI). We would want to extend and ensure this addition can be used by platform teams using CircleCI Server.

It turned out that the Runner API:

Thankfully, this was a quick patch. I was also able to verify this fix against my own CircleCI Server instance.

Things I learnt

This feature was satisfying for me to build, and I had many learning points along the journey.

  1. Read the code: This feature would not have been completed if I did not dig deeper into the publicly-available source code 📖

  2. Keep trying: I am still learning (and failing) at Go. However, I think it is important to keep trying and learning. Keeping this source code open-sourced forces me to keep myself honest too about my lack of knowledge. For fellow engineers out there, let's keep at it! 🤓

#terraform #circleci #runner #go

buy Kelvin a cup of coffee

How Terraform works under the hood

I had recently published a Terraform provider for CircleCI.

I was motivated to understand better what is happening under all that fun when we execute the Terraform commands.

Here is my attempt to summarize how Terraform works under the hood.

Behind the magic

We can see Terraform as an ecosystem; There is the Terraform core (including its CLI), and a registry of providers and modules.

Ultimately, you can think of the Terraform core as a state machine. It stores the current state of your stack, and syncs the state of your stack against the cloud, via the right providers, based on your Terraform .tf files.

To support the sync, all providers need to implement the CRUD functions on the resource (so that Terraform can create, read, update and delete an AWS EC2 instance, for example).

If you are a cloud service provider that already exposes a RESTful API, your services are very much “Terraform-able” then.

Terraform core talks to the provider via gRPC. This is one of the many reasons why Hashicorp recommend providers to be written in Go.

#terraform #provider #summary

Infrastructure as Code (IaC) is not a new concept for many. Prior to joining CircleCI, I have been wrangling with AWS CloudFormation long enough (2 years) to bear some of its pain-points.

Since joining CircleCI, I have been exposed to Terraform through internal and external projects. So far, I really like that Terraform is not tied to specific cloud providers.

I also like that any service can contribute a Terraform provider, thereby allowing users to define your service's resources as code. (Any cloud resource that exposes an CRUD / RESTful API is a good candidates for Terraform.)

Recently, I have been wishing many things to be “Terraform-able”.

As a support engineer, I use Zendesk daily. We create Zendesk macros for canned responses in particular. Overtime, these macros grow, and it may be hard to manage or validate changes to our macros. I do wish Zendesk has a Terraform provider, so we can institute our macros in code.

Recently, I also wished that I could Terraform my resume. Then, I snapped out of it.

When you have a powerful hammer like Terraform, everything looks like nails.

#terraform #infrastructureascode #declarative

buy Kelvin a cup of coffee