setting up s3 for logs in airflow

UPDATE Airflow 1.10 makes logging a lot easier. For s3 logging, set up the connection hook as per the above answer and then simply add the following to airflow.cfg [core] # Airflow can store logs remotely in AWS S3. Users must supply a remote # location URL (starting with either ‘s3://…’) and an Airflow connection … Read more

Problem with start date and scheduled date in Apache Airflow

Airflow schedules tasks at the end of the interval (See documentation reference) Meaning that when you do: start_date: datetime(2020, 12, 7, 8, 0,0) schedule_interval: ‘0 8 * * *’ The first run will kick in at 2020-12-08 at 08:00+- (depends on resources) This run’s execution_date will be: 2020-12-07 08:00 The next run will kick in … Read more

Proper way to create dynamic workflows in Airflow

Here is how I did it with a similar request without any subdags: First create a method that returns whatever values you want def values_function(): return values Next create method that will generate the jobs dynamically: def group(number, **kwargs): #load the values if needed in the command you plan to execute dyn_value = “{{ task_instance.xcom_pull(task_ids=”push_func”) … Read more

How to submit Spark jobs to EMR cluster from Airflow?

While it may not directly address your particular query, broadly, here are some ways you can trigger spark-submit on (remote) EMR via Airflow Use Apache Livy This solution is actually independent of remote server, i.e., EMR Here’s an example The downside is that Livy is in early stages and its API appears incomplete and wonky … Read more

Wiring top-level DAGs together

Taking hints from @Viraj Parekh‘s answer, I was able to make TriggerDagRunOperator work in the intended fashion. I’m hereby posting my (partial) answer; will update as and when things become clear. How to overcome limitation of parent_id prefix in dag_id of SubDags? As told @Viraj, there’s no straight way of achieving this. Extending SubDagOperator to … Read more

Pip error even Microsoft Visual C++ 14.0 is installed

This problem was solved on a computer having Visual Studio Community 2017 v15.5.2 and the Visual Studio Installer v1.16.1247.518 installed. The steps used are as follows: Start the Visual Studio Installer Visual Studio Installer showed a Installed section that stated that Visual Studio Community 2017. In that section was a drop-down titled More. The drop- … Read more