Utilizing Apache Airflow for Efficient Job Orchestration in Web Scraping of Covid-19 Data
Overview
Airflow is a platform that elevates cron jobs to the next level, enabling the creation and monitoring of task scheduling. Airflow utilizes directed acyclic graphs (DAGs) as workflows to be executed, automating scripts.
Meanwhile, the COVID-19 pandemic remains a grave concern. The outbreak was first identified in Wuhan, China in December 2019 and quickly spread worldwide within a month. As a result, it is crucial to monitor daily COVID-19 patient data in Indonesia. Kompas News is one of the platforms that provides daily updates on COVID-19 data through a dedicated dashboard. This data will be scraped using Python and scheduled using Apache Airflow as a workflow scheduler.
Full article
You can read full article at Medium