Just took some time to setup Airflow and convert some old cronjob into a DAG.
This is way overkill for my needs (it was mostly to play a bit with Airflow), but I find it pretty interesting on how the whole tool/paradigm kind of guides you into having a easily debuggable job:
- Since everything is a DAG, it feels a bit off to have just a single task, so you break your work in small tasks.
- But now you need to pass data between tasks, and the simplest way to do that is by having intermediate snapshots.
- And since you need to pick a name for the snapshots, it's quite intuitive to just add a timestamp, so you go for the nicely provided
ds
variable, which is the logical run date of the job (reproducible when you rerun the jobs, this is no "datetime.now()").
So now my old hacky shell script to notify me of free Steam games is now a fully reproducible pipeline with monitoring, logging and data snapshots.
Maybe now I'll find out why the script stopped working a while ago