This is definitely going to be much different from previous posts. Instead of looking at the statistical insights of my favorite baseball players, I’ll instead be taking a look at some of the air quality data gathered by San Antonio’s SmartCity sensors. The sensors are placed at three different parts of the city — Downtown, the Medical Center, and Brooks City Base area.

These SmartCity sensors were installed several years ago as part of a new San Antonio initiative, but the data has so far does not seem to have been examined in-depth. The sensors can measure and track multiple variables. What I’m most interested in for this first part of the project is the air quality measured at these locations.
San Antonio has struggled recently to meet air quality standards of the EPA’s Clean Air Act, subjecting the city to stricter oversight and supervision of emissions. It’s not surprising — San Antonio is a large, sprawling city, bisected by multiple freeways and highways. A car is basically required to get anywhere in the city.
The San Antonio Report just published an article today about how smoggy San Antonio is and how the polluted air harms residents’ health. The city will be subject to further oversight that could potentially harm businesses and the economy, at least according to business owners and some politicians. On the other hand, this terrible air quality harms residents and impacts their ability to actually be outside and travel to businesses, so it seems like it’s in everyone’s best interest to have clean air.
One item in the report I found interesting was that car owners will now have to get their vehicles emissions tested yearly, and if there’s one thing Americans love, it’s their big, noxious-fume spewing trucks. Ozone was specifically cited in the report, but other harmful pollutants like particulate matter are of serious concern. Poor air quality affects everyone, but especially residents living in poverty and neglected neighborhoods. Those neighborhoods have higher rates of negative health outcomes, such as asthma, lung disease and various cardiovascular diseases. I’ll be taking a look at all this for this project.
THE DATASETS
San Antonio’s CivTech Datathon published data from their SmartCity sensors and asked the public to take a look at it. I wasn’t able to register a team for the competition but I wanted to take a look at what data they released for myself.
The 3 air quality datasets are massive. There’s 16 columns and more than 23,000 observations in each. The sensors took measurements every 3 minutes from April 20, 2021 to May 20, 2021. There’s a solid month’s worth of data which we can use to extrapolate some larger insights. Some of the columns of interest are: measures of Pm 1.0, Pm 2.5, Pm10, SO2 (sulfur dioxide), O3 (ozone), CO (carbon monoxide), and NO2 (nitrogen dioxide). One of the final columns is also called “Alert Triggered” and indicates whether the measurements of these pollutants was high enough to trigger an air quality alert.
Pm1.0, Pm2.5 and Pm10 are all fine particulate matter that are easily inhalable, and Pm2.5 is associated with the worst health affects. Particulate matter can get deep into your lungs and your blood stream, which causes all sorts of issues. Particulate matter is particularly insidious in that it’s often made up of a combination of other pollutants in the atmosphere.
Sulfur Dioxide (SO2) is put into the atmosphere through the burning of fossil fuels, especially from power plants. Children are particularly susceptible to high levels of SO2 in the atmosphere. Nitrogen dioxide is also spewed into the air through emissions and can interact with other particles in the air to form harmful particulate matter. Carbon monoxide is released into the atmosphere from vehicles and machinery burning fuel. Ozone, when inhaled, interacts negatively with the bodies organs and tissues and is of serious concern for people with respiratory illnesses.
I’m sure these sensors are still collecting data but all that’s available are measurements and readings from April to May. I plan to take a look at levels of each pollutant individually at all three locations, what alerts have been triggered and how often they’re triggered and hopefully put it all together in some easily readable format.
OZONE
For this first pollutant, I’m going to be taking a look at the ozone levels measured by the three sensors. I’ll start with the Brooks City Base sensor and work my way north. I’m using R for this analysis and some of my code is probably clunky so if there’s areas for improvement, someone let me know!
With more than 23,000 observations I’m going to be looking at random samples. I’m going to start with 250 and see how that goes and refine it from there. 250 doesn’t seem like too many observations so in later editions I’ll likely have to adjust my graph axes.
First, I’ll show the code I used to get a random 250 observation sample and how I plotted the observations. Using the “tidyverse” package makes it pretty easy to take a sample of your data. In the “ggplot” section of the code I used the “scale_x_datetime()” function to ensure that the x-axis was easily readable and extended the y-axis below 0 for readability.
cosa_brooks_sample <- COSA_Brooks_Air_Quality %>%
sample_n(250)
ggplot(cosa_brooks_sample, aes(x = DateTime, y = O3)) +
geom_line() +
scale_x_datetime() +
ylim(-10, 100)
cosa_downtown_sample <- COSA_Downtown_Air_Quality %>%
sample_n(250)
ggplot(cosa_downtown_sample, aes(x = DateTime, y = O3)) +
geom_line() +
scale_x_datetime() +
ylim(-10, 100)
cosa_medcenter_sample <- COSA_Medical_Center_Air_Quality %>%
sample_n(250)
ggplot(cosa_medcenter_sample, aes(x = DateTime, y = O3)) +
geom_line() +
scale_x_datetime() +
ylim(-10, 100)
Below are the plots that were produced:



Unfortunately the datasets did not come with a data dictionary so I’m making the assumption that all the ozone measurements are in parts-per-billion (ppb) which is traditionally how the EPA measures ozone. In 2015, the EPA lowered the standard rate for ground-level O3 (most harmful to people) from 75 ppb to 70 ppb and in 2018 decided to remain at the 2015 standard.
The red line in the graphs above indicate the accepted level of O3 as mandated by the EPA. Both the Brooks City Base Sensor and the Downtown Sensor measured significant levels of O3 during the time for which observations are available. All three sensors recorded significant spikes in O3 during the week of 3–10 May, though Brooks City and Downtown have higher levels of O3 during that time than the Medical Center. In multiple instances, O3 was higher than EPA mandated standards. I’m more familiar with the Downtown area than I am with either Brooks or Medical Center so it’s easy for me to understand how O3 would be higher there. Downtown San Antonio is small but dense but there’s still a lot of vehicular traffic. There’s also a lot of construction and the heavy machinery involved that certainly plays a role in ozone emissions in the downtown area.
Also, an important factor to remember is that this is just a random sample of 250 observations from the dataset. Perhaps it’s not large enough to get an accurate bigger picture of the information. That got me thinking about how better to visualize this data. I took a closer look and the vast majority of observations show an ozone reading of 0. When I say vast majority…think like 22,000 observations with a value of 0. So I decided to filter and graph for only observations larger than 0.



The graphs above show every time an ozone measurement was recorded that was greater than 0. This come out to around ~5,000 measurements for each sensor. Obviously it doesn’t look like 5,000 individual points on the graph and it’s likely that there are measurement levels were the same every time they were recorded. But the scatterplots above show more accurately the distribution of ozone measurements in those 3 areas.
Next I looked at how many times O3 levels were above the EPA standard for the 3 sensor areas. Using the “count” verb I pulled the number of times that the sensors measured O3 levels above 70.
cosa_brooks_O3 %>% # How often O3 levels exceeded 70
count(O3 > 70)
# A tibble: 2 x 2
`O3 > 70` n
<lgl> <int>
1 FALSE 22643
2 TRUE 432
cosa_downtown_O3 %>% # How often O3 levels exceeded 70 at Downtown sensor
count(O3 > 70)
# A tibble: 2 x 2
`O3 > 70` n
<lgl> <int>
1 FALSE 22887
2 TRUE 492
cosa_medcenter_O3 %>% # How often O3 levels exceeded 70 at MedCenter
count(O3 > 70)
# A tibble: 2 x 2
`O3 > 70` n
<lgl> <int>
1 FALSE 23247
2 TRUE 132
The tables above show that the Brooks City Base sensor and the Downtown sensor experience high levels of O3 much more frequently than the Medical Center. Over a 30-day period, the Downtown sensor measures O3 levels above 70 about 16 times a day, compared to only 4 times a day at the Medical Center.
How many alerts for ozone were triggered at these sensors?
cosa_brooks_test %>%
count(AlertTriggered == "o3")
# A tibble: 2 x 2
`AlertTriggered == "o3"` n
<lgl> <int>
1 TRUE 5
2 NA 5760
cosa_downtown_test %>%
count(AlertTriggered == "o3")
# A tibble: 2 x 2
`AlertTriggered == "o3"` n
<lgl> <int>
1 TRUE 4
2 NA 5760
cosa_med_center_test %>%
count(AlertTriggered == "o3")
# A tibble: 2 x 2
`AlertTriggered == "o3"` n
<lgl> <int>
1 TRUE 7
2 NA 5760
The amount of alerts triggered for ozone during the datasets’ timeframe looks a little low, definitely lower than I thought it would, given the number of times O3 levels were measured above 70.
Now let’s see the date, time and level of O3 measured when the alerts were triggered.
cosa_brooks_test %>%
select(DateTime, O3, AlertTriggered) %>%
filter(AlertTriggered == "o3")
# A tibble: 5 x 3
DateTime O3 AlertTriggered
<dttm> <dbl> <chr>
1 2021-05-10 13:50:46 20 o3
2 2021-05-09 07:19:07 5 o3
3 2021-05-07 03:05:46 5 o3
4 2021-04-29 10:06:18 3 o3
5 2021-04-26 05:10:46 3 o3
cosa_downtown_test %>%
select(DateTime, O3, AlertTriggered) %>%
filter(AlertTriggered == "o3")
# A tibble: 4 x 3
DateTime O3 AlertTriggered
<dttm> <dbl> <chr>
1 2021-04-24 04:20:34 10 o3
2 2021-04-24 04:35:43 10 o3
3 2021-04-24 04:45:12 10 o3
4 2021-04-24 04:51:31 10 o3
cosa_med_center_test %>%
select(DateTime, O3, AlertTriggered) %>%
filter(AlertTriggered == "o3")
# A tibble: 7 x 3
DateTime O3 AlertTriggered
<dttm> <dbl> <chr>
1 2021-05-13 19:23:43 3 o3
2 2021-05-04 06:03:39 1 o3
3 2021-04-30 13:40:03 1 o3
4 2021-04-27 00:03:08 8 o3
5 2021-04-22 02:13:19 3 o3
6 2021-04-22 17:48:01 3 o3
7 2021-04-20 04:02:24 3 o3
Interestingly, the alert was never triggered when O3 levels were 70 or greater. This makes me think that either something is wrong with the sensor, or more likely, there’s something I’m missing in the data.
That’s a brief look at ozone levels in San Antonio measured from these new SmartCity sensors here in San Antonio. Obviously, the city has a lot of work to do to control ground-level ozone emissions. But individual actions matter, too. If you can, take the bus, ride your bike or walk to wherever you’re headed. You’ll enjoy the trip more and if you can get some exercise in, too, it’ll be worth it. Stay tuned for future parts on the other pollutants measured by these sensors!
Smoggy San Antonio facing further regulation as air quality continues to suffer
Smoggy San Antonio facing further regulation as air quality continues to suffer
Leave a Reply