EPA Pollution Data Activity

This activity demonstrates a project in which data is downloaded from the Environmental Protection Agency (EPA), and analyzed using Google Colab. This project can be modified in any number of ways: one might choose to analyze data from different years, or a different pollutant than what we used in our code.

The EPA has published data on several air pollutants since 1980. The pollutants tracked include carbon monoxide, ozone, and lead. This data can be found by searching for “EPA Daily Data” in your favorite search engine, or by following the link below:

Data: https://tinyurl.com/PIPE-LINE-EPAData

Once you have chosen a pollutant, year, and a region to study, you can upload the data to Google Colab. The code we used is below:

Colab: https://tinyurl.com/PIPELINE-Group5Colab

In this code, we are specifically looking at carbon monoxide, the code will have to be altered in order to investigate other pollutants. The code should work for data from any year or region, though. This code will run through some basic descriptive statistics, create some nice charts showing the levels of pollutants over the year, and run an ANOVA to investigate the hypothesis of whether different levels of the pollutant occur at different testing sites.

1 Like