Chemical Concentrations throughout the Neighborhood and throughout the day

Once again, I am going to focus on the atmospheric concentration data included in the Local Sense Labs sensor dataset. I have discussed in previous assignments that the although the atmospheric concentration data appears to show variability throughout the day and between the two sensor locations, we have not actually shown that there is a statistically significant difference in either case. In this assignment, I will investigate whether the concentration is genuinely different between the Environ1 and Environ2 locations, and whether there is genuine variability throughout the day. (This analysis is somewhat limited, because we have not verified that the two sensors are properly calibrated so that the data values are comparable to each other – the units are still in voltage, after all – but we’ll assume that the data have been calibrated previously for our purposes.) The following code demonstrates the initial setup for our data:

library(“dplyr”)

library(“ggplot2”)

library(“gridExtra”)

envi <- read.csv(“sensor_data_environment-MT.csv”)

envi <- distinct(envi, id_wasp, sensor, value, timestamp)

envi <- envi[,c(3,4,11,5)]

First, we will consider the two locations of the Environment sensors, and we’ll investigate each chemical separately. In this case, we are simply interested in whether the binary variable id_wasp influences the value of the chemical concentration, and so, a simple t-test will provide the information we need.

t.no2 <- t.test(envi[envi$sensor==”NO2″,]$value~envi[envi$sensor==”NO2″,]$id_wasp)

t.co <- t.test(envi[envi$sensor==”CO”,]$value~envi[envi$sensor==”CO”,]$id_wasp)

t.o2 <- t.test(envi[envi$sensor==”O2″,]$value~envi[envi$sensor==”O2″,]$id_wasp)

t.co2 <- t.test(envi[envi$sensor==”CO2″,]$value~envi[envi$sensor==”CO2″,]$id_wasp)

The results of these tests are summarized below:

  NO2 CO O2 CO2
T statistic 4.5403 – 22.8262 48.8681 – 16.5273
degrees 1178.709 1010.852 1158.722 894.672
p-Value 6.195e-06 < 2.2e-16 < 2.2e-16 < 2.2e-16
Environ1 mean 2.010362 1.913590 0.5815106 0.8431141
Environ2 mean 2.005268 2.404169 0.5425828 0.9847937

Note the differences in the Environ1 and Environ2 means in the NO2 data seem very small, especially compared to the CO data. The t statistics – reflecting the difference in means relative to the variation of the data – are not too close to zero, and the p-Values calculated in all four tests are exceedingly small. So, we can conclude that the atmospheric concentration of the chemicals at each location is very closely related; the means are close enough that they are unlikely to have occurred by chance.

 

What about the variability of each chemical throughout the day? Is there a significant difference between NO2 concentration in the middle of the day versus the night? We previously added a flag to our dataset that indicates whether the measurement was taken in the morning, afternoon/evening, or at night. We can use this flag with the ANOVA test to determine if there is a significant difference in the measurements in these three times of day.

aov.no2 <-aov(envi[envi$sensor==”NO2″,]$value~envi[envi$sensor==”NO2″,]$day_flg)

aov.co <- aov(envi[envi$sensor==”CO”,]$value~envi[envi$sensor==”CO”,]$day_flg)

aov.o2 <- aov(envi[envi$sensor==”O2″,]$value~envi[envi$sensor==”O2″,]$day_flg)

aov.co2 <- aov(envi[envi$sensor==”CO2″,]$value~envi[envi$sensor==”CO2″,]$day_flg)

The results are summarized below:

  F Value Probability >F
NO2 27.34 2.47e-12
CO 305.4 <2e-16
O2 6.55 0.00148
CO2 70.31 <2e-16

In all four cases, the F value is greater than 1, and in the CO and CO2 cases in particular, the F value is much larger than 1. This indicates that there is a larger variability between the groups than the variability within each group. Therefore, we can conclude that there is a statistically significant difference between the concentrations in the morning, late day, and nighttime hours. Another way to frame this is that, for NO2, the time of day accounts for 27 times the variability in the data than we would expect to see by pure chance. The probability >F is the probability that we would see this much variability by chance (not attributed to the time of day), and in all four cases, this probability is again exceedingly small.

 

The F test does not provide any information about which times of day indicate high or low concentrations of these chemicals. To dig into this, we will visualize the average value of atmospheric concentration at each time period in the day. First we’ll calculate the means and standard errors for each case (sensor location, chemical, and time of day), and then we’ll create bar plots for each chemical so that we can compare both the sensor location and the time of day. Interestingly, these plots appear to show very little variability (the standard error bars are almost too hard to see because the data are so tightly centered around the mean value). This reaffirms the results of the ANOVA test, demonstrating that the within-group variance is quite small. Meanwhile, the difference in the means (the difference in the bar height for equivalent times of day), which approximately represents the between-group variance, is substantially different. This is most apparent for the CO and CO2 data. For the NO2 data, the between-group and within-group variance is quite difficult to see because all the data is so tightly grouped. The results of the ANOVA test and t test concluded that the difference between times of day and sensor locations were significant, but the individual t statistic and F value were rather small compared with CO and CO2. For the O2 data, the difference between the sensor locations is notable, but the difference between the times of day is very small, demonstrated by the high t statistic but very low F statistic for the O2 data.

mean

means <- aggregate(value~id_wasp+sensor+day_flg,data=envi,mean)

names(means)[4] <- ‘Mean’

ses <- aggregate(value~id_wasp+sensor+day_flg,data=envi,function(x) sd(x, na.rm=TRUE)/sqrt(length(!is.na(x))))

names(ses)[4] <- ‘SE’

means <- merge(means,ses,by=c(“id_wasp”,”sensor”,”day_flg”))

means <- transform(means,lower=Mean-SE,upper=Mean+SE)

bar1 <- ggplot(data=means[means$sensor==’CO’,], aes(x=id_wasp, y=Mean, fill=day_flg)) +

geom_bar(stat=”identity”,position=”dodge”) +

geom_errorbar(aes(ymax=upper,ymin=lower),position=position_dodge(0.9)) +

guides(fill=guide_legend(title=”Time of Day”)) +

labs(title = “CO”, x=NULL, y=”Mean”)

bar2 <- ggplot(data=means[means$sensor==’NO2′,], aes(x=id_wasp, y=Mean, fill=day_flg)) +

geom_bar(stat=”identity”,position=”dodge”) +

geom_errorbar(aes(ymax=upper,ymin=lower),position=position_dodge(0.9)) +

guides(fill=guide_legend(title=”Time of Day”)) +

labs(title = “NO2″, x=NULL, y=”Mean”)

bar3 <- ggplot(data=means[means$sensor==’CO2′,], aes(x=id_wasp, y=Mean, fill=day_flg)) +

geom_bar(stat=”identity”,position=”dodge”) +

geom_errorbar(aes(ymax=upper,ymin=lower),position=position_dodge(0.9)) +

guides(fill=guide_legend(title=”Time of Day”)) +

labs(title = “CO2″, x=NULL, y=”Mean”)

bar4 <- ggplot(data=means[means$sensor==’O2′,], aes(x=id_wasp, y=Mean, fill=day_flg)) +

geom_bar(stat=”identity”,position=”dodge”) +

geom_errorbar(aes(ymax=upper,ymin=lower),position=position_dodge(0.9)) +

guides(fill=guide_legend(title=”Time of Day”)) +

labs(title = “O2″, x=NULL, y=”Mean”)

grid.arrange(bar1,bar2,bar3,bar4,ncol=2)

 

We can also visualize the periodic average as a departure from the daily average. This will help us detect any differences in the variability throughout the day in one location versus the other. The process is similar to our previous visualization, but this time we’ll calculate the daily mean for each sensor and location, and create bar plots that show the departure from the daily mean for each time period. For NO2, CO2, and O2, the distinctive feature is that both locations have approximately the same pattern throughout the day, just scaled differently based on location (i.e., NO2 is above average in the afternoon, much below average in the morning, and below but closer to average overnight, and both locations exhibit this pattern except that Environ2 is closer to average overall). This effect is muted in the O2 plot because all values are so similar for O2. However, in the CO plot, note that the two locations exhibit different behavior during the overnight hours. For Environ1, the concentrations are below average overnight, while they are above average for Environ2 overnight. This suggests that there may be some interaction between the influence of location and the influence of time of day; this could be driven by some physical driver, such as diffusion or some other type of chemical transfer.

departure

dailymean <- aggregate(value~id_wasp+sensor,data=envi,mean)

names(dailymean)[3] <- ‘DailyMean’

plotdata <- merge(means,dailymean,by=c(“id_wasp”,”sensor”))

bar1 <- ggplot(data=plotdata[plotdata$sensor==’CO’,], aes(x=id_wasp, y=(Mean – DailyMean), fill=day_flg)) +

geom_bar(stat=”identity”,position=”dodge”) +

geom_errorbar(aes(ymax=(lower-DailyMean),ymin=(upper-DailyMean)),position=position_dodge(0.9)) +

guides(fill=guide_legend(title=”Time of Day”)) +

labs(title = “CO”, x=NULL, y=”Departure from Mean”)

bar2 <- ggplot(data=plotdata[plotdata$sensor==’NO2′,], aes(x=id_wasp, y=(Mean – DailyMean), fill=day_flg)) +

geom_bar(stat=”identity”,position=”dodge”) +

geom_errorbar(aes(ymax=(lower-DailyMean),ymin=(upper-DailyMean)),position=position_dodge(0.9)) +

guides(fill=guide_legend(title=”Time of Day”)) +

labs(title = “NO2″, x=NULL, y=”Departure from Mean”)

bar3 <- ggplot(data=plotdata[plotdata$sensor==’CO2′,], aes(x=id_wasp, y=(Mean – DailyMean), fill=day_flg)) +

geom_bar(stat=”identity”,position=”dodge”) +

geom_errorbar(aes(ymax=(lower-DailyMean),ymin=(upper-DailyMean)),position=position_dodge(0.9)) +

guides(fill=guide_legend(title=”Time of Day”)) +

labs(title = “CO2″, x=NULL, y=”Departure from Mean”)

bar4 <- ggplot(data=plotdata[plotdata$sensor==’O2′,], aes(x=id_wasp, y=(Mean – DailyMean), fill=day_flg)) +

geom_bar(stat=”identity”,position=”dodge”) +

geom_errorbar(aes(ymax=(lower-DailyMean),ymin=(upper-DailyMean)),position=position_dodge(0.9)) +

guides(fill=guide_legend(title=”Time of Day”)) +

labs(title = “O2″, x=NULL, y=”Departure from Mean”)

grid.arrange(bar1,bar2,bar3,bar4,ncol=2)

 

We can test for this interaction between factors (location and time of day) using a two-way ANOVA test, as follows:

aov2.no2 <- aov(envi[envi$sensor==”NO2″,]$value~envi[envi$sensor==”NO2″,]$day_flg * envi[envi$sensor==”NO2″,]$id_wasp)

aov2.co <- aov(envi[envi$sensor==”CO”,]$value~envi[envi$sensor==”CO”,]$day_flg * envi[envi$sensor==”CO”,]$id_wasp)

aov2.o2 <- aov(envi[envi$sensor==”O2″,]$value~envi[envi$sensor==”O2″,]$day_flg * envi[envi$sensor==”O2″,]$id_wasp)

aov2.co2 <- aov(envi[envi$sensor==”CO2″,]$value~envi[envi$sensor==”CO2″,]$day_flg * envi[envi$sensor==”CO2″,]$id_wasp)

  F Value of Interaction Probability >F for Interaction
NO2 1.513 0.221
CO 17.09 4.8e-08
O2 3.029 0.0488
CO2 46.62 <2e-16

The test confirms our hypothesis that the location and time of day interact to influence the concentration values, with a particularly large F value for CO2, and a more moderate but still significant value for CO. Given the larger Probability >F values for NO2 and O2, it is not especially likely that the two factors interact to influence concentration for NO2 and O2.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s