Setting up Apache Airflow on macOS using Rancher Desktop involves several steps. In this guide, we’ll walk you through installing Rancher Desktop, deploying a Kubernetes cluster, and deploying Airflow using Helm.

Install Rancher Desktop

Rancher Desktop is a Kubernetes distribution for macOS. Download the latest release from the official GitHub repository and follow the installation instructions. After installation, you can start Rancher Desktop from your Applications folder.

Install kubectl and helm

You’ll need kubectl and helm command-line tools to manage your Kubernetes cluster and deploy Airflow. You can install them using Homebrew:

brew install kubectl
brew install helm

Configure Rancher Desktop

Open Rancher Desktop and ensure that it is running. It will automatically create a local Kubernetes cluster using k3s. You can check the cluster status by clicking the Kubernetes icon in the Rancher Desktop window. Wait for the cluster to become active before proceeding.

Set up shared volumes in kubernetes

Create a hostpath storage class in your Kubernetes cluster by creating a new YAML file, e.g., hostpath-storageclass.yaml, with the following content:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hostpath
provisioner: docker.io/hostpath
reclaimPolicy: Retain

Apply the storage class configuration:

kubectl apply -f hostpath-storageclass.yamlCode language: CSS (css)

To set up the actual host path for the PVCs in the Airflow Helm chart, you need to create a PersistentVolume (PV) for each PVC with the specified host path:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: airflow-dags-pv
spec:
  capacity:
    storage: 5Gi
  storageClassName: hostpath
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /path/to/your/host/dags
Code language: JavaScript (javascript)

Replace /path/to/your/host/dags with the actual path on your host machine where you want to store the DAGs.

Apply the PV configuration:

kubectl apply -f airflow-dags-pv.yamlCode language: CSS (css)

Optionally repeat these steps for other PVCs (logs, data, PostgreSQL, and Redis) by creating separate PV YAML files with the desired host paths and applying them.

Configure and run Airflow

Create a custom values.yaml file to configure the DAGs folder and additional settings. Save the following content as values.yaml on your local machine:

# Airflow version
airflowVersion: "2.4.3"

# Executor
executor: CeleryExecutor

# Worker replicas
workers:
  replicas: 1

# DAGs and plugins configuration
dags:
  gitSync:
    enabled: false
  persistence:
    enabled: true
    existingClaim: airflow-dags-pv

# Extra Python packages and plugins
airflow:
  extraPipPackages:
    - package-name==package-version
    - another-package-name==another-package-version

This configuration sets the hostpath storage class for DAGs, logs, data, PostgreSQL, and Redis. Replace package-name and package-version with the names and versions of the desired plugins or Python packages.

Deploy Apache Airflow using the custom values.yaml:

helm repo add apache-airflow https://airflow.apache.org
helm repo update
kubectl create namespace airflow

helm install airflow apache-airflow/airflow \
  --namespace airflow \
  --values values.yaml

Access the Airflow web interface

To access the Airflow web interface, you need to port-forward the Airflow web service to your local machine:

kubectl port-forward --namespace airflow svc/airflow-web 8080:8080

Now, you can open a web browser and navigate to http://localhost:8080 to access the Airflow web interface.

  • Interact with your Airflow deployment

You can now interact with your Airflow deployment using the web interface or the kubectl and helm command-line tools. To upgrade or uninstall your Airflow deployment, you can use the helm upgrade and helm uninstall commands, respectively.

  • Access and modify the DAGs

You can now access and modify the DAGs using the shared folder on your local machine. Any changes to the DAGs in the local folder will be reflected in the Airflow pods.

Remember that a local folder for DAGs is suitable for development and testing purposes but not recommended for production environments. In production, consider using a distributed file system or a version control system like Git to manage DAGs.

Leave a Reply