### Introduction

In previous post we’ve seen graphically in scatter-plots how similar the measurements of the sensors are. We’ve seen also that in order to test feasibility of the creation of the Livability index it is important to see that one sensor’s measurements are different that the other one.

In this post we going to go further in this direction and we going to test if the differences observed in the scatter-plots were significant by doing t test for every single measurement in both sensors. Also we going to test if these measurements are constant along the days.

### Materials

For this analysis, we used an extended dataset in which we have three days for ambient variables (10 to 12 August 2016) and 10 days for ambient variables (10 to 19 August 2016). The data was aggregated every 5 minute.

### Exploratory Analysis

The first action is to carry out bar-charts to have a better idea of where the similitude and differences between the sensors are. So, in the following figures we plot a bar chart with the mean values for each measurement on every day available.

Figure 1. Comparison between means values for sensor Envi1 and Envi2 (R code in anex1)

Figure 2. Comparison between means values for sensor City1 and City2 (R code in anex2)

From the figure1 and 2 we can see that most of the measurements are different from one sensor to another with the exception of the NO2 that seem to be very similar. To test such differences, we going to carry out a t.test between them. The results are shown in table1.

### Testing the differences between sensors

In table 1 we can see the mean value of the difference of the measurements taken by one sensor and the other, the t statistic and the p-value of the t.test of such difference.

Table 1: t.test between measurement of City1 vs City2 and Env1 vs Env2. We expect that is the measurements from one sensor to the other are equal, their differences (MeanDif) (R code in anex3)

As we suggest before, from the table1 we conclude that only NO2 present similar values between the two sensors in a significant way. That means that although the distance between the sensor is short, the environment and the atmospherically conditions in one spot to another change in a significant way.

As we saw in previous posts, there could be subtle demographic differences on the composition of the blocks where the sensors are located. For example, the blocks where City1 and Environment1 are located is surrounded by departmental stores, so a more intense pedestrian flow is expected. On the other hand, the place where Envirnment2 and City2 are located present lower intensity of pedestrians. So these demographic variables might be responsible for some of the differences in the sensor’s measurements. In order to prove that, we need to add some external demographic variables and see we can explain/predict them through the sensor measurements.

### Testing the differences between sensors along days

So far, we’ve seen that there are significant differences between sensors, but the question is, are those difference stable along the time?

To address this interrogation we carried out a F test along the days for each measurement and another one for the difference between the measurements of both sensors. The results are shown in table 2 for City measurements and table 3 for the environmental measurements.

Table 2: p-values for F.tests: City1/City2 mean value across days. Dif: difference between values of city1 and city2 across days. Ref: values less than 0.05 are consider that there are difference across days |
Table 3: p-values for F.tests: City1/City2 mean value across days. Dif: difference between values of city1 and city2 across days. Ref: values less than 0.05 are consider that there are difference across days |

From table 2 we are able to see that the difference between both sensor remain stable along the days for Luminosity and Humidity but this differences are not stable for Temperature and Noise. From table 3 we can see that the differences of the measurements are not stable along the day.

### Conclusion

In this post we tests the differences between the sensor in order to examine if the distance between them (although is little) it is sufficient to register behavioral differences between two spots in the city. This assure the feasibility of the construction of an index based on this variables. Also we’ve notice the following sub conclusions:

- NO2 is the only variable that does not presents significant changes between the sensors.
- Both Luminosity and Humidity are the only two variables that present fixed changes along the days.
- Only Luminosity from both sensor, and noise from city1 present the same values along the days.

### Annexes

1-Barchart Environmental sensors

graph <- function(x, name, sensorname){ names(melted) <- c("day", "sensor", "value") AggDaySensorVal <- aggregate(value~day+sensor, melted, mean) se <- aggregate(value~day+sensor, melted, function(x) sd(x,na.rm = T)/sqrt(length(!is.na(x))) ) AggDaySensorVal<-merge(AggDaySensorVal, se, by = c("day","sensor"), suffixes = c("","Se")) AggDaySensorVal <- transform(AggDaySensorVal, upper=value+valueSe, lower=value-valueSe) levels(AggDaySensorVal$sensor) <- sensorname AggDaySensorVal$day <- as.factor(AggDaySensorVal$day) graph <- ggplot(data = AggDaySensorVal, aes(x=day, y = value, fill=sensor)) + geom_bar(stat="identity", position="dodge") + geom_errorbar(aes(ymax=upper, ymin=lower), position = position_dodge(.9)) + ylab("Mean") + ggtitle(name) return(graph) } melted <- melt(combine[,c("day","Envi1CO","Envi2CO" )], id.vars="day") EnviGraph1<-graph(melted, "CO", c("Envi1","Envi2")) melted <- melt(combine[,c("day","Envi1CO2","Envi2CO2" )], id.vars="day") EnviGraph2<-graph(melted, "CO2", c("Envi1","Envi2")) melted <- melt(combine[,c("day","Envi1NO2","Envi2NO2" )], id.vars="day") EnviGraph3<-graph(melted, "NO2",c("Envi1","Envi2")) melted <- melt(combine[,c("day","Envi1O2","Envi2O2" )], id.vars="day") EnviGraph4<-graph(melted, "O2",c("Envi1","Envi2"))

2-Bar-charts Environmental variables

melted <- melt(combine[,c("day","City1LUM","City2LUM" )], id.vars="day") CityGraph1<-graph(melted, "Luminosity",c("City1","City2")) melted <- melt(combine[,c("day","City1HUM","City2HUM" )], id.vars="day") CityGraph2<-graph(melted, "Humidity",c("City1","City2")) melted <- melt(combine[,c("day","City1TC","City2TC" )], id.vars="day") CityGraph3<-graph(melted, "Temperature",c("City1","City2")) melted <- melt(combine[,c("day","City1MCP","City2MCP" )], id.vars="day") CityGraph4<-graph(melted, "Noise",c("City1","City2"))

3-t.test among the measurements

options(digits = 4) temp<- t.test(combine$City1LUM, combine$City2LUM ,paired = TRUE) results <- data.frame(measurement = "Luminosity", MeanDif=temp$estimate[[1]], statistic= temp$statistic[[1]], pvalue = temp$p.value[[1]]) temp <- t.test(combine$City1HUM, combine$City2HUM ,paired = TRUE) results <- rbind(results, data.frame(measurement = "Humidity", MeanDif=temp$estimate[[1]], statistic= temp$statistic[[1]], pvalue = temp$p.value[[1]])) temp <- t.test(combine$City1TC, combine$City2TC ,paired = TRUE) results <- rbind(results, data.frame(measurement = "Temperature", MeanDif=temp$estimate[[1]], statistic= temp$statistic[[1]], pvalue = temp$p.value[[1]])) temp <- t.test(combine$City1MCP, combine$City2MCP ,paired = TRUE) results <- rbind(results, data.frame(measurement = "Noise", MeanDif=temp$estimate[[1]], statistic= temp$statistic[[1]], pvalue = temp$p.value[[1]])) temp <- t.test(combine$Envi1CO, combine$Envi2CO ,paired = TRUE) results <- rbind(results, data.frame(measurement = "CO", MeanDif=temp$estimate[[1]], statistic= temp$statistic[[1]], pvalue = temp$p.value[[1]])) temp <- t.test(combine$Envi1CO2, combine$Envi2CO2 ,paired = TRUE) results <- rbind(results, data.frame(measurement = "CO2", MeanDif=temp$estimate[[1]], statistic= temp$statistic[[1]], pvalue = temp$p.value[[1]])) temp <- t.test(combine$Envi1NO2, combine$Envi2NO2 ,paired = TRUE) results <- rbind(results, data.frame(measurement = "NO2", MeanDif=temp$estimate[[1]], statistic= temp$statistic[[1]], pvalue = temp$p.value[[1]])) temp <- t.test(combine$Envi1O2, combine$Envi2O2 ,paired = TRUE) results <- rbind(results, data.frame(measurement = "O2", MeanDif=temp$estimate[[1]], statistic= temp$statistic[[1]], pvalue = temp$p.value[[1]])) stargazer(results, header=FALSE, type='latex', summary =F ,title = "t.test between measurement of City1 vs City2 and ENv1 vs Env2")

4-F.test among the measurements

combine$TempDif <- combine$City1TC - combine$City2TC combine$HumDif <- combine$City1HUM - combine$City2HUM combine$LumDif <- combine$City1LUM - combine$City2LUM combine$McpDif <- combine$City1MCP - combine$City2MCP combine$CODif <- combine$Envi1CO - combine$Envi2CO combine$CO2Dif <- combine$Envi1CO2 - combine$Envi2CO2 combine$NO2Dif <- combine$Envi1NO2 - combine$Envi2NO2 combine$O2Dif <- combine$Envi1O2 - combine$Envi2O2 resultsCity <- matrix(nrow = 4, ncol = 3) resultsEnv <- matrix(nrow = 4, ncol = 3) resultsCity[1,1] <- summary(aov(formula = City1LUM ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[1,2] <- summary(aov(formula = City2LUM ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[1,3] <- summary(aov(formula = LumDif ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[2,1] <- summary(aov(formula = City1HUM ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[2,2] <- summary(aov(formula = City2HUM ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[2,3] <- summary(aov(formula = HumDif ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[3,1] <- summary(aov(formula = City1TC ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[3,2] <- summary(aov(formula = City2TC ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[3,3] <- summary(aov(formula = TempDif ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[4,1] <- summary(aov(formula = City1MCP ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[4,2] <- summary(aov(formula = City2MCP ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity[4,3] <- summary(aov(formula = McpDif ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[1,1] <- summary(aov(formula = Envi1CO ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[1,2] <- summary(aov(formula = Envi2CO ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[1,3] <- summary(aov(formula = CODif ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[2,1] <- summary(aov(formula = Envi1CO2 ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[2,2] <- summary(aov(formula = Envi2CO2 ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[2,3] <- summary(aov(formula = CO2Dif ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[3,1] <- summary(aov(formula = Envi1O2 ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[3,2] <- summary(aov(formula = Envi2O2 ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[3,3] <- summary(aov(formula = O2Dif ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[4,1] <- summary(aov(formula = Envi1NO2 ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[4,2] <- summary(aov(formula = Envi2NO2 ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsEnv[4,3] <- summary(aov(formula = NO2Dif ~ day, data = combine))[[1]]$`Pr(>F)`[1] resultsCity <- data.frame(round(resultsCity,3), row.names = c("Lum","Hum","Tc","Noise") ) resultsEnv <- data.frame(round(resultsEnv,3), row.names = c("CO","CO2","O2","NO2") ) names(resultsCity) <- c("City1", "City2","Dif") names(resultsEnv) <- c("Env1", "Env2","Dif") stargazer(resultsCity, header=FALSE, type='latex', summary =F ,title = "t.test between measurement of City1 vs City2 and ENv1 vs Env2") stargazer(resultsEnv, header=FALSE, type='latex', summary =F ,title = "t.test between measurement of ENv1 vs Env2")