Audhi Aprilliant
Audhi Aprilliant Data Scientist. Tech Writer. Statistics, Data Analytics, and Computer Science Enthusiast

Utilizing Apache Airflow for Efficient Job Orchestration in Web Scraping of Covid-19 Data

Utilizing Apache Airflow for Efficient Job Orchestration in Web Scraping of Covid-19 Data

Overview

Airflow is a platform that elevates cron jobs to the next level, enabling the creation and monitoring of task scheduling. Airflow utilizes directed acyclic graphs (DAGs) as workflows to be executed, automating scripts.

Meanwhile, the COVID-19 pandemic remains a grave concern. The outbreak was first identified in Wuhan, China in December 2019 and quickly spread worldwide within a month. As a result, it is crucial to monitor daily COVID-19 patient data in Indonesia. Kompas News is one of the platforms that provides daily updates on COVID-19 data through a dedicated dashboard. This data will be scraped using Python and scheduled using Apache Airflow as a workflow scheduler.

Full article

You can read full article at Medium

comments powered by Disqus