Skip to content

brudnak/ha-rancher-rke2

Repository files navigation

RKE2 Rancher HA Bootstrapper

Deploy Rancher High Availability (HA) clusters on AWS using RKE2 with automated setup and secure configuration.

Key Features

  • No Cert Manager required — SSL is handled via AWS ACM
  • Secure by default — HTTPS enabled from deployment
  • Fully automated — Rancher installation happens automatically
  • Simple workflow:
    1. Configure your Helm commands in tool-config.yml
    2. Run the test command

Rancher is installed with --set tls=external since ACM certificates handle TLS termination.

Overview

This repository provides:

  • Deploy 3-node RKE2 HA clusters with Terraform
  • Auto-configure each node with secure ALB integration
  • Use AWS ACM for certificates (no cert-manager required)
  • Generate and execute custom installation scripts
  • Automatically inject correct URLs into Helm commands
  • Single test command deployment

Directory Structure

Place tool-config.yml at the project root:

.
├── README.md
├── tool-config.yml
├── go.mod
├── terratest/
│   └── test.go
├── modules/
│   └── aws/

Deployment

Run the following command to deploy the infrastructure:

go test -v -run TestHaSetup -timeout 60m ./terratest

This command will:

  • Launch EC2 instances, ALBs, and Route53 DNS records
  • Configure TLS with AWS ACM certificates
  • Bootstrap and join all 3 nodes into RKE2 cluster
  • Generate and execute Rancher installation scripts
  • Automatically inject correct URLs into Helm commands

Rancher Installation

Rancher is installed automatically during the setup process:

  1. Correct URLs are injected into each Helm command
  2. Install scripts are generated for each HA instance
  3. Scripts are executed to install Rancher

Installation uses ALB with ACM certificates for secure HTTPS access without requiring cert-manager.

Note: Install scripts remain available in each high-availability-X/ directory for manual re-execution if needed.

Cleanup

To destroy all resources:

go test -v -run TestHACleanup -timeout 20m ./terratest

This will:

  • Destroy all infrastructure via Terraform
  • Clean up generated files and folders
  • Remove all AWS resources

Configuration

Use one of these checked-in examples as your starting point:

Then copy the one you want to tool-config.yml and adjust the non-secret values.

Environment Secrets

These four secrets are now read from environment variables only:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • DOCKERHUB_USERNAME
  • DOCKERHUB_PASSWORD

The cleanest setup on your machine is to put them in ~/.zprofile:

export AWS_ACCESS_KEY_ID="your-aws-access-key"
export AWS_SECRET_ACCESS_KEY="your-aws-secret-key"
export DOCKERHUB_USERNAME="your-dockerhub-username"
export DOCKERHUB_PASSWORD="your-dockerhub-password"

Then reload your shell:

source ~/.zprofile

If you do not want Docker Hub authentication, leave both Docker Hub environment variables unset.

Sample tool-config.yml

For available RKE2 Kubernetes versions, refer to: RKE2 v1.32.X Release Notes

Important Configuration Notes

  • rancher.mode supports:
    • manual to provide full Helm commands yourself
    • auto to provide one or more Rancher versions and let the tool resolve chart source, image source, RKE2 version, and installer checksum for you
  • In manual mode, the number of Helm commands under rancher.helm_commands must match total_has
  • In auto mode:
    • use rancher.version for a single HA
    • use rancher.versions for multiple HAs, with exactly one version per HA
  • Each Helm command will be used for a specific HA instance (first command for first instance, etc.)
  • You can customize each Helm command with different parameters (bootstrap password, version, etc.)
  • The hostname parameter in each Helm command will be automatically replaced with the correct URL
    • You can leave it blank, use a placeholder, or include your own value (it will be overridden)
  • The tool validates your config shape and fails early if the number of versions or Helm commands does not match total_has
  • The install script is automatically executed for each HA instance during setup
  • In manual mode:
    • use k8s.version for a single HA
    • use k8s.versions for multiple HAs, with exactly one RKE2 version per HA
    • use rke2.install_script_sha256 for a single HA
    • use rke2.install_script_sha256s for multiple HAs, keyed by exact RKE2 version
  • rke2.preload_images: true downloads the RKE2 image bundle before install to help avoid Docker Hub rate limits
  • AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY must be set in your shell environment
  • DOCKERHUB_USERNAME and DOCKERHUB_PASSWORD are optional environment variables
    • If you set them, the tool creates /etc/rancher/rke2/registries.yaml so RKE2 can authenticate to Docker Hub
    • If you leave them unset, the tool skips Docker Hub authentication
  • In auto mode, the tool prints a resolved plan for each HA and asks you to continue before provisioning starts
  • The project does not use curl | sh for the RKE2 installer anymore
    • It downloads the versioned installer script
    • It checks that script against the pinned SHA256
    • It only runs the script if the checksum matches

Auto Mode Example

Use auto mode when you want to provide a Rancher version and let the tool resolve the rest.

rancher:
  mode: auto
  versions:
    - "2.13-head"
    - "2.14.0"
  distro: auto
  bootstrap_password: "your-password"
  auto_approve: false

rke2:
  preload_images: true

total_has: 2  # Number of HA clusters to create (must match number of rancher.versions in auto mode)

tf_vars:
  aws_region: "us-east-2"
  aws_prefix: "xyz" # your initials, keep it short! 
  aws_vpc: ""
  aws_subnet_a: ""
  aws_subnet_b: ""
  aws_subnet_c: ""
  aws_ami: ""
  aws_subnet_id: ""
  aws_security_group_id: ""
  aws_pem_key_name: ""
  aws_route53_fqdn: ""

In auto mode, the tool will:

  1. Resolve the Rancher chart repo and chart version for each HA version you requested
  2. Resolve the Rancher image settings for each HA
  3. Look up a supported RKE2 minor from the Rancher support matrix
  4. Pick the latest patch release in that RKE2 line
  5. Resolve the installer SHA256 for that exact RKE2 version
  6. Generate one Helm command per HA and inject the correct URL later during setup
  7. Print the generated plan(s)
  8. Ask you to continue or cancel before provisioning

For a single HA, you can use this shorter config:

rancher:
  mode: auto
  version: "2.13-head"
  distro: auto
  bootstrap_password: "your-password"
  auto_approve: false

total_has: 1

If you do not want Docker Hub authentication, leave both DOCKERHUB_USERNAME and DOCKERHUB_PASSWORD unset in your shell.

Manual Mode Example

Use manual mode when you want full control over the Helm commands.

rancher:
  mode: manual
  helm_commands:
    - |
      helm install rancher rancher-latest/rancher \
        --namespace cattle-system \
        --set hostname=placeholder \
        --set bootstrapPassword=your-password \
        --set tls=external \
        --set global.cattle.psp.enabled=false \
        --set rancherImageTag=v2.14.0 \
        --version 2.14.0 \
        --set agentTLSMode=system-store
    - |
      helm install rancher rancher-latest/rancher \
        --namespace cattle-system \
        --set hostname=placeholder \
        --set bootstrapPassword=your-password \
        --set tls=external \
        --set global.cattle.psp.enabled=false \
        --set rancherImageTag=v2.14.0 \
        --version 2.14.0 \
        --set agentTLSMode=system-store

total_has: 2

k8s:
  versions:
    - "v1.33.7+rke2r1"
    - "v1.34.6+rke2r1"

rke2:
  install_script_sha256s:
    v1.33.7+rke2r1: "bfbd978d603b7070f5748c934326db509bf1470c97d3f61a3aaa6e2eed6bd054"
    v1.34.6+rke2r1: "2d24db2184dd6b1a5e281fa45cc9a8234c889394721746f89b5fe953fdaaf40a"
  preload_images: true

For a single manual HA, the older shorter form still works:

k8s:
  version: "v1.33.7+rke2r1"

rke2:
  install_script_sha256: "bfbd978d603b7070f5748c934326db509bf1470c97d3f61a3aaa6e2eed6bd054"

Updating the RKE2 checksum

You only need to update the checksum values manually when you use manual mode and change the matching RKE2 version.

  1. Pick the RKE2 version you want.
  2. Download that exact installer script.
  3. Compute its SHA256.
  4. Paste the hash into tool-config.yml.

Run:

export RKE2_VERSION="v1.33.7+rke2r1"
curl -fsSL "https://raw.githubusercontent.com/rancher/rke2/${RKE2_VERSION}/install.sh" -o /tmp/rke2-install.sh
shasum -a 256 /tmp/rke2-install.sh

You will get output like:

bfbd978d603b7070f5748c934326db509bf1470c97d3f61a3aaa6e2eed6bd054  /tmp/rke2-install.sh

Copy only the hash on the left and put it into tool-config.yml:

k8s:
  version: "v1.33.7+rke2r1"

rke2:
  install_script_sha256: "bfbd978d603b7070f5748c934326db509bf1470c97d3f61a3aaa6e2eed6bd054"
  preload_images: true

If the downloaded installer does not match the pinned hash, the setup stops immediately and refuses to run it.

Cleanup Cost Estimate

TestHACleanup now prints a best-effort AWS cost estimate after destroy for:

  • EC2 runtime
  • EBS root volumes

This is only an estimate, not an AWS bill.

The estimate uses:

  • live AWS pricing data for EC2 and EBS unit prices
  • actual EC2 instance launch times from AWS to estimate runtime
  • actual attached root EBS volumes from AWS to estimate storage cost

It does not include everything AWS might charge for, such as:

  • ALB usage
  • Route53 charges
  • data transfer
  • request-driven costs

So the number is meant to be helpful and roughly right for the main infrastructure cost drivers, not a final billing total.

Output Example

Each HA setup creates a folder like:

high-availability-1/
├── install.sh         # Rancher installation script
├── kube_config.yaml   # RKE2 kubeconfig

Contributing

Pull requests and questions are welcome.


Built with Go, Terraform, and Rancher.

About

Rancher HA on AWS with RKE2. Automated setup, ACM SSL, single command deploy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages