Expect unexpected
It was a super early and gloomy autumn morning, and Mother was packing me for the school trip to a day of forest camping. I was still only half awake…
It was a super early and gloomy autumn morning, and Mother was packing me for the school trip to a day of forest camping. I was still only half awake…
I'm often asked what DataOps is and how to be effective in it. DataOps is a relatively new term that refers to the practice of applying DevOps principles to data…
Here's a guide to setting up Apache Airflow with Docker on a Linux machine, with shared DAGs and plugins folder, extra plugins, specific Python packages on Airflow workers, and a…
Setting up Apache Airflow on macOS using Rancher Desktop involves several steps. In this guide, we'll walk you through installing Rancher Desktop, deploying a Kubernetes cluster, and deploying Airflow using…
Have you ever felt like banging your head over the wall of misunderstanding or lack of wording to describe a technical solution and prove your point of view? Oh my,…
This article is about writing end-to-end test for a data pipeline. It will cover Airflow, as one of the most popular data pipeline scheduler now days and one of the…
The best way to familiarize yourself with the Hadoop ecosystem or to do proof of concept: is to play with it in a sandbox. Cloudera provides 2 Quick Start options:…
Consider the following situation: A bundle of .avro files is stored on HDFS. They need to be converted to Impala tables. Schemas are not provided with files, at least not…