Are low-income people of color disproportionately affected by air pollution in Denver, CO?
Environmental issues such as air pollution, waste management, and the placement of hazardous plants disproportionately affect low income and marginalized communities. These communities are at high risk for chronic health issues such as asthma and cancer, leading to higher mortality rate. Environmental injustices were brought into the publica arena in the 1970s through several landmark court cases.
Today, in Flint, Michigan, where almost 60 percent of the population is Black, the city has been facing a major public health crisis since 2014 due to lead poisoning in the city's water pipes. In Colorado, the expansion of I-70 threatens to exacerbate air pollution in the primarily Latino community of Elyria/Swansea. Between Baton Rouge, Louisiana, and New Orleans, Cancer Alley encompasses more than 150 industrial plants that pollute low income predominantly Black communities.
Given these widespread environmental injustices, I want to explore how socioeconomic demographics predict exposure to air pollution. While environmental injustice exists throughout the world, I will focus my analysis on Denver, CO. Future analyses may include other cities such as Detroit and Chicago.
Government agencies seeking to set policy in Denver on environmental issues can use this data science project to inform their decisions. In particular, The City of Denver and CDOT can work together to better place highways and road systems that do not have an adverse effect on low-income minority communities.
For race, ethnicity, income, and poverty data, I will use factfinder.census.gov and run through all zip codes in the city of Denver. While time consuming, this will produce the granular data I need to conduct my analysis. Air Quality Data will come from colorado.gov/airquality. I will export air quality data for all monitoring sites within the city of Denver over the past 30 days. This air quality data include Carbon Monoxide, Nitric Oxide, Nitrogen Dioxide, Ozone, PM 2.5, and PM 10.
Process for cleaning the data
I began the process by creating three dataframes:income: combined income data from FactFinder.census.gov for all area codes in Denverpopulation: combined population data from FactFinder.census.gov for all area codes in Denverairquality: combined airquality data from all monitoring stations in Denver for Colorado.gov/airquality
For each of these dataframes, I created a column of the corresponding zip code of each row using regular expressions from the filename. Given the way the air quality data was collected (manually scraped from the web into Excel files), the resulting combined airquality data frame from all the csvs contained some empty rows. Therefore, I dropped rows containing missing values. I used drop.na(how=”any”) because these rows with missing data still had one column of zip code data.
I then created two dataframes for analysis. The first dataframe aggregate the data by site and took the mean AQI for each site. The second dataframe calculated the percent of days with “poor” air quality (AQI > 50) over a year for each site.
Since FactFinder provides more data than what is relevant to the analysis, I referenced the income_key and population_key files to determine which values to keep in the dataframes. I retained values for the percent of each race in a given zip code and the percent breakdown of income in a given zip code. I then filtered the respective income and population data frames with these selected columns and merged the two data frames into one master dataframe df. Finally, I used the uszipcode package to assign a longitude and latitude to each zip code. This ensure that we can alter perform a k-nearest-neighbors analysis on df using the AQI values from airquality. Exploratory Data Anlaysis
Below are six linear regression plots of race variables plotted against air quality index for each zip code.
Interestingly enough, we see either no correlation or a negative correlation with air quality index and black and Latino populations. The only positive correlation we observe is for Asian populations.
Income and Air QualityLike with the race data above, the income data produce some surprising results. It seems that households than take in less income experience better air quality than their wealthier counterparts. This is the opposite trend we expected to see from our hypothesis.
With a working data frame, I imported the data into ArcGIS to perform spatial interpolation. I tested out several interpolation techniques including inverse distance weighting, natural neighbor, and spline.
Inverse distance weighting interpolation produced the most granular spatial data of air pollution. The resulting map is shown below.
Figure 1. Inverse distance weighting of the air quality monitoring stations in Denver, CO. Each green dot is a monitoring station and each purple dot is an approximation of a zip code's center.
These results are surprising! There is a positive correlation with wealth and air pollution.Let's take a closer look at the air pollution map below.
In this map (above), red corresponds to a higher AQI (poorer air quality) and green corresponds to low AQI (good air quality). We immediately see some interesting patterns. The uppermost red clustering is Elyria Swansea, which is 84% Hispanic. The area is also younger than most of Denver, with 55% of the population having families with children.The red clustering in the middle is Globeville, another predominately hispanic area. Globeville's population is 68% hispanic.
Accoridng to a denver.gov report: "The household income in Elyria Swansea is $44,700, and in Globeville is $39,200, both significantly lower than Denver’s average of $73,100.2"The lowest red clustering is west Colfax between City Park West and South Park Hill. This area is predominately white and middle to upper class.
Figure 2. A close-up of the inverse distance weighting interpolation showcasing the disparity along I-25.
From the spatial analysis performed in this study, we can hypothesize that poorer air quality relative to the total air quality at monitoring stations around Denver are correlated to positioning near highways.
Two out of the three locations of poor air quality (red areas on the interpolation map) correspond to disadvantaged communities: Elyria Swansea, which is 84% Hispanic; and Globeville, which is 68% Hispanic.
So, are low-income people of color disproportionately affected by air pollution in Denver? In this study, we discovered two instances where that is the case. To really dig deep into this problem, I will next apply inferential statistics to make a determination!