Setting up Apache Airflow on macOS using Rancher Desktop involves several steps. In this guide, we’ll walk you through installing Rancher Desktop, deploying a Kubernetes cluster, and deploying Airflow using Helm.
Install Rancher Desktop
Rancher Desktop is a Kubernetes distribution for macOS. Download the latest release from the official GitHub repository and follow the installation instructions. After installation, you can start Rancher Desktop from your Applications folder.
Install kubectl
and helm
You’ll need kubectl
and helm
command-line tools to manage your Kubernetes cluster and deploy Airflow. You can install them using Homebrew:
brew install kubectl
brew install helm
Configure Rancher Desktop
Open Rancher Desktop and ensure that it is running. It will automatically create a local Kubernetes cluster using k3s. You can check the cluster status by clicking the Kubernetes icon in the Rancher Desktop window. Wait for the cluster to become active before proceeding.
Set up shared volumes in kubernetes
Create a hostpath
storage class in your Kubernetes cluster by creating a new YAML file, e.g., hostpath-storageclass.yaml
, with the following content:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hostpath
provisioner: docker.io/hostpath
reclaimPolicy: Retain
Apply the storage class configuration:
kubectl apply -f hostpath-storageclass.yaml
Code language: CSS (css)
To set up the actual host path for the PVCs in the Airflow Helm chart, you need to create a PersistentVolume (PV) for each PVC with the specified host path:
apiVersion: v1
kind: PersistentVolume
metadata:
name: airflow-dags-pv
spec:
capacity:
storage: 5Gi
storageClassName: hostpath
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
hostPath:
path: /path/to/your/host/dags
Code language: JavaScript (javascript)
Replace /path/to/your/host/dags
with the actual path on your host machine where you want to store the DAGs.
Apply the PV configuration:
kubectl apply -f airflow-dags-pv.yaml
Code language: CSS (css)
Optionally repeat these steps for other PVCs (logs, data, PostgreSQL, and Redis) by creating separate PV YAML files with the desired host paths and applying them.
Configure and run Airflow
Create a custom values.yaml
file to configure the DAGs folder and additional settings. Save the following content as values.yaml
on your local machine:
# Airflow version
airflowVersion: "2.4.3"
# Executor
executor: CeleryExecutor
# Worker replicas
workers:
replicas: 1
# DAGs and plugins configuration
dags:
gitSync:
enabled: false
persistence:
enabled: true
existingClaim: airflow-dags-pv
# Extra Python packages and plugins
airflow:
extraPipPackages:
- package-name==package-version
- another-package-name==another-package-version
This configuration sets the hostpath
storage class for DAGs, logs, data, PostgreSQL, and Redis. Replace package-name
and package-version
with the names and versions of the desired plugins or Python packages.
Deploy Apache Airflow using the custom values.yaml
:
helm repo add apache-airflow https://airflow.apache.org
helm repo update
kubectl create namespace airflow
helm install airflow apache-airflow/airflow \
--namespace airflow \
--values values.yaml
Access the Airflow web interface
To access the Airflow web interface, you need to port-forward the Airflow web service to your local machine:
kubectl port-forward --namespace airflow svc/airflow-web 8080:8080
Now, you can open a web browser and navigate to http://localhost:8080 to access the Airflow web interface.
- Interact with your Airflow deployment
You can now interact with your Airflow deployment using the web interface or the kubectl
and helm
command-line tools. To upgrade or uninstall your Airflow deployment, you can use the helm upgrade
and helm uninstall
commands, respectively.
- Access and modify the DAGs
You can now access and modify the DAGs using the shared folder on your local machine. Any changes to the DAGs in the local folder will be reflected in the Airflow pods.
Remember that a local folder for DAGs is suitable for development and testing purposes but not recommended for production environments. In production, consider using a distributed file system or a version control system like Git to manage DAGs.