Setting up Apache Airflow on macOS using Rancher Desktop involves several steps. In this guide, we’ll walk you through installing Rancher Desktop, deploying a Kubernetes cluster, and deploying Airflow using Helm.
Install Rancher Desktop
Rancher Desktop is a Kubernetes distribution for macOS. Download the latest release from the official GitHub repository and follow the installation instructions. After installation, you can start Rancher Desktop from your Applications folder.
helm command-line tools to manage your Kubernetes cluster and deploy Airflow. You can install them using Homebrew:
brew install kubectl brew install helm
Configure Rancher Desktop
Open Rancher Desktop and ensure that it is running. It will automatically create a local Kubernetes cluster using k3s. You can check the cluster status by clicking the Kubernetes icon in the Rancher Desktop window. Wait for the cluster to become active before proceeding.
Set up shared volumes in kubernetes
hostpath storage class in your Kubernetes cluster by creating a new YAML file, e.g.,
hostpath-storageclass.yaml, with the following content:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: hostpath provisioner: docker.io/hostpath reclaimPolicy: Retain
Apply the storage class configuration:
kubectl apply -f hostpath-storageclass.yamlCode language: CSS (css)
To set up the actual host path for the PVCs in the Airflow Helm chart, you need to create a PersistentVolume (PV) for each PVC with the specified host path:
/path/to/your/host/dags with the actual path on your host machine where you want to store the DAGs.
Apply the PV configuration:
kubectl apply -f airflow-dags-pv.yamlCode language: CSS (css)
Optionally repeat these steps for other PVCs (logs, data, PostgreSQL, and Redis) by creating separate PV YAML files with the desired host paths and applying them.
Configure and run Airflow
Create a custom
values.yaml file to configure the DAGs folder and additional settings. Save the following content as
values.yaml on your local machine:
# Airflow version airflowVersion: "2.4.3" # Executor executor: CeleryExecutor # Worker replicas workers: replicas: 1 # DAGs and plugins configuration dags: gitSync: enabled: false persistence: enabled: true existingClaim: airflow-dags-pv # Extra Python packages and plugins airflow: extraPipPackages: - package-name==package-version - another-package-name==another-package-version
This configuration sets the
hostpath storage class for DAGs, logs, data, PostgreSQL, and Redis. Replace
package-version with the names and versions of the desired plugins or Python packages.
Deploy Apache Airflow using the custom
helm repo add apache-airflow https://airflow.apache.org helm repo update kubectl create namespace airflow helm install airflow apache-airflow/airflow \ --namespace airflow \ --values values.yaml
Access the Airflow web interface
To access the Airflow web interface, you need to port-forward the Airflow web service to your local machine:
kubectl port-forward --namespace airflow svc/airflow-web 8080:8080
Now, you can open a web browser and navigate to http://localhost:8080 to access the Airflow web interface.
- Interact with your Airflow deployment
You can now interact with your Airflow deployment using the web interface or the
helm command-line tools. To upgrade or uninstall your Airflow deployment, you can use the
helm upgrade and
helm uninstall commands, respectively.
- Access and modify the DAGs
You can now access and modify the DAGs using the shared folder on your local machine. Any changes to the DAGs in the local folder will be reflected in the Airflow pods.
Remember that a local folder for DAGs is suitable for development and testing purposes but not recommended for production environments. In production, consider using a distributed file system or a version control system like Git to manage DAGs.