How to install Airflow: For Apple M2

First install Python 3.11 https://www.python.org/downloads/release/python-3116/

install Pip, update Pip, alias python 3.11 https://pip.pypa.io/en/stable/installation/

https://pip.pypa.io/en/stable/installation/

Install Zsh Shell: https://github.com/ohmyzsh/ohmyzsh/wiki/Installing-ZSH

edit Zshrc file sudo nano ~/.zshrc

set python alias for python 3.11

alias /usr/local/bin,
alias python='python3.11'
alias pip='pip3.11'

python -m pip install --upgrade pip

set $AIRFLOW_HOME system variable

export AIRFLOW_HOME=~/airflow

Install Airflow through Pip wheel (Cross Platform (Linux (WSL2)/Apple)

AIRFLOW_VERSION=2.8.0

# Extract the version of Python you have installed. If you're currently using a Python version that is not supported by Airflow, you may want to set this manually.

# See above for supported versions.

PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)" CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt" # For example this would install 2.8.0 with python 3.8: https://raw.githubusercontent.com/apache/airflow/constraints-2.8.0/constraints-3.8.txt pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

install Swagger Python

pip install swaggerpy

Edit ZSHRC sudo nano ~/.zshrc

add export PATH=/Library/Frameworks/Python.framework/Versions/3.11/bin:$PATH

To the end of the .zshrc file

TO START AIRFLOW:

airflow scheduler

open up another terminal in VS-Code

airflow webserver --port 8081

use a sample CSV that you have of data to run this script you can adjust to any CSV:

Adjust your configuration file to the DAG directory that you want to use: The configuration file is in ~/airflow

It’s name is airflow.cfg

[core]
# The folder where your airflow pipelines live, most likely a
# subfolder in a code repository. This path must be absolute.
#
# Variable: AIRFLOW__CORE__DAGS_FOLDER
#
dags_folder = /Users/####/Documents/luceeapp/airflows

Here is a sample DAG:

import pendulum
from airflow.datasets import Dataset
from airflow.models.dag import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator  # Import PythonOperator
import pandas as pd

# [START dataset_def]
dag1_dataset = Dataset("/Users/####/Documents/luceeapp/testcsv/laptops.csv", extra={"hi": "bye"})

# ...

def sort_data(**kwargs):
    # Load the dataset
    dataset_path = "/Users/####/Documents/luceeapp/testcsv/laptops.csv"
    df = pd.read_csv(dataset_path)

    # Sort the data based on a specific column, for example 'column_to_sort'
    sorted_df = df.sort_values(by='old_price')

    # Save the sorted data back to the same file or a new one
    sorted_df.to_csv(dataset_path, index=False)

# ...

with DAG(
    dag_id="dataset_produces_1",
    catchup=False,
    start_date=pendulum.datetime(2024, 1, 9, tz="UTC"),
    schedule="@daily",
    tags=["new", "task"],
) as dag1:
    # Define the dataset before using it in the DAG
    dag1_dataset = Dataset("/Users/####/Documents/luceeapp/testcsv/laptops.csv", extra={"hi": "bye"})

    # Previous task definition
    bash_task = BashOperator(outlets=[dag1_dataset], task_id="New_Task", bash_command= "sleep 5")
    # Add a PythonOperator for sorting the data
    sort_data_task = PythonOperator(
        task_id='sort_data_task',
        python_callable=sort_data,

    )

    # Set task dependencies
    bash_task >> sort_data_task  # Adjust the dependencies as needed

It will run, like this:

The code basically runs a sort by a column and load the CSV back into the CSV, it can be adjusted for other things. This is just a basic coding example because literally poeple could do nothing to get this started.

Leave a Reply

Your email address will not be published. Required fields are marked *